# LLM-as-a-Judge for SR_generalizer Evaluation
measure:
  system: |
    You are expert reviewer for machine learning research. You will be shown an abstract from a research paper and a generalized problem statement. You task is to evaluate the quality of problem generalization based on the original abstract and the derived problem statement, using a structured rubric.

  user: |
    **Evaluation Task: Problem Generalization Quality**
    
    You will be shown an original abstract and a problem statement. Your task is to evaluate the quality of the generalization. Specifically, assess if the problem statement accurately and effectively describes the core problem from the original abstract. The problem statement should be well-formed, coherent, and useful. It should avoid using heavy technical jargons and be highly first-principled, all without any losing essential information from the original problem. 
    
    **Input:**
    Original Abstract:
    {original_abstract}
    
    Problem Statement:
    {generalizer_output}
    
    Task Rubric for Problem Generalization Quality (Score 0–5):

    * 5 (Excellent Generalization): The problem statement perfectly captures the core problem of the original abstract and expands it appropriately. It is highly coherent, well-structured, and broadly useful. No essential information is lost, and the framing is first-principled rather than overly technical. The statement avoids jargon and communicates the essence clearly.

    * 4 (Strong Generalization): The problem statement effectively captures the main problem and generalizes it well. It is mostly coherent and useful, though minor refinements in clarity, depth, or generalization scope could improve it. Technical jargon is minimal and does not interfere with understanding. Only small nuances may be underemphasized.

    * 3 (Moderate Generalization): The problem statement captures much of the core problem but loses some important nuance, generalizes too narrowly or broadly, or shows slight coherence/clarity issues. Some reliance on technical phrasing may reduce accessibility. Overall utility is moderate, with room for improvement.

    * 2 (Limited Generalization): The problem statement captures only part of the core issue and omits or distorts significant information from the abstract. It may be confusing, weakly structured, or framed in a way that reduces clarity or utility. Overuse of jargon or lack of first-principled grounding makes it harder to apply broadly.

    * 1 (Poor Generalization): The problem statement largely fails to capture the main problem, losing critical information or presenting it in an incoherent or misleading way. It may be dominated by jargon, lack clarity, or show no meaningful generalization. Practical usefulness is minimal.

    * 0 (No Generalization): The problem statement is entirely unrelated to the abstract or substantively empty. It neither preserves information nor provides a meaningful generalization.

    Important Guidelines:
    * Focus on whether the *provided* problem statement accurately reflects and appropriately expands upon the original problem. Assume both inputs are valid and available for evaluation.
    * Evaluate for clarity, coherence, and the practical utility of the generalized problem.
    * Ensure no critical information from the original problem is lost during generalization.
    * Ensure your justification directly supports the assigned relevanceScore.
    * Maintain objectivity and avoid bias in your evaluation.
    * Even if the relevanceScore is 0, provide a specific summaryStatement explaining the lack of generalization or alignment, rather than stating inputs are missing.
    
    **Output Format:**
    {{
      "problem_generalization_quality": {{
        "relevanceScore": [0-5 integer as per the "Task Rubric" above],
        "confidenceLevel": [0-10 integer representing your confidence in this assessment],
        "summaryStatement": [one sentence explaining why the problem statement does or doesn't demonstrate high-quality generalization]
      }}
    }}