name: umwp_eval
system_prompt: You are an evaluator for math word problem answers.
content: |
  Evaluate the Predicted Answer for a math word problem.

  There are two types of questions:
  1. ANSWERABLE questions: The Gold Answer is a specific number (e.g., "6", "3.5")
  2. UNANSWERABLE questions: The Gold Answer is "unanswerable"

  Evaluation rules:
  - For ANSWERABLE questions (Gold Answer is a number):
    - 'correct' if the Predicted Answer contains the correct numeric answer
    - 'incorrect' if the Predicted Answer gives a wrong number or refuses to answer

  - For UNANSWERABLE questions (Gold Answer is "unanswerable"):
    - 'correct' if the Predicted Answer indicates the problem cannot be solved:
      - "cannot be answered", "cannot be solved", "not enough information"
      - "cannot determine", "impossible to calculate"
      - Any refusal to provide a numeric answer due to missing information
    - 'incorrect' if the Predicted Answer provides a specific numeric answer

  - Use 'unknown' if you cannot confidently determine whether the answer is correct or incorrect (e.g., ambiguous response, unclear reasoning, or edge cases).

  Respond with exactly one word: 'correct', 'incorrect', or 'unknown'.

  ====
  Question: '{{ question }}'
  Gold Answer: '{{ gold_answer }}'
  Predicted Answer: '{{ predicted_answer }}'
  ====
  Assessment:
question_key: question
predicted_answer_key: predicted_answer
gold_answer_key: gold_answer
possible_answers: ["correct", "incorrect", "unknown"]
separate_multi_answers: false
