version: "v1"
created: "2025-01-30T00:00:00"
description: "Three-stage math reasoning prompts with unified adaptive decision format"
scaffold_type: "three_stage"
git_hash: null
experiment_id: null
author: "<ANONYMIZED>"

# VLM prompts for mathematical image description (for training the captioner)
vlm_initial_description_prompt: |
  I need your help analyzing this image to prepare for answering the following question: 

  {question}

  IMPORTANT: DO NOT answer the question directly. Instead, provide a comprehensive and detailed description of everything visible in the image that could be relevant for answering this question.

  Focus on describing:
  - All objects, people, text, and visual elements in the image
  - Spatial relationships between different elements
  - Any text content that is visible, transcribed exactly
  - Colors, shapes, patterns, and visual attributes
  - Relevant contextual details and background information

  Your description should be detailed enough that someone could mentally reconstruct the image without seeing it, but DO NOT provide step-by-step instructions on how to recreate it.

# VLM confidence estimation prompt (auxiliary prompt for confidence experiment)
vlm_confidence_prompt: |
  I need your help analyzing this image to prepare for answering the following question: 

  {question}

  IMPORTANT: DO NOT answer the question directly. Instead, provide a comprehensive and detailed description of everything visible in the image that could be relevant for answering this question.

  Focus on describing:
  - All objects, people, text, and visual elements in the image
  - Spatial relationships between different elements
  - Any text content that is visible, transcribed exactly
  - Colors, shapes, patterns, and visual attributes
  - Relevant contextual details and background information

  Your description should be detailed enough that someone could mentally reconstruct the image without seeing it, but DO NOT provide step-by-step instructions on how to recreate it.

  Now, considering both the question and what you see in the image, assess: **How difficult would this question be for an average college student?**

  Consider factors like:
  - Mathematical complexity and required knowledge level
  - How much visual reasoning and spatial understanding is needed
  - Whether the problem involves complex multi-step reasoning
  - How easy it would be to make errors or get confused
  - Whether this requires specialized knowledge or just basic skills

  Rate the difficulty level honestly:

  **20%**: "Very easy - basic math or simple visual identification that most people would get right immediately"

  **40%**: "Easy-moderate - straightforward problem requiring basic skills, low chance of errors"

  **60%**: "Moderate - requires some thinking and careful work, average students might struggle a bit"

  **80%**: "Difficult - complex reasoning, multiple steps, high chance of making mistakes even with good skills"

  **95%**: "Very difficult - advanced concepts, intricate reasoning, most students would likely get this wrong"

  Please format your response as follows:
  DESCRIPTION: [Your detailed image description here]
  CONFIDENCE: [Difficulty percentage for an average college student: 20, 40, 60, 80, or 95]

# VLM focused description prompt for clarifying questions
vlm_focused_description_prompt: |
  I will provide an image, an image description, and the question this description is intended to help answer.

  Original Question: "{question}"

  Previous Description: {previous_descriptions}

  CONTEXT: The description above was provided for this image, but some details might be missing or unclear. We are not entirely sure if it contains all the information needed to answer the question accurately, so we are asking this specific follow-up question to gather additional visual details.

  Your specific task: {focus_request}

  CRITICAL INSTRUCTIONS:
  - You are a VISUAL DESCRIBER only - DO NOT attempt to answer the original question
  - DO NOT solve the problem or provide calculations
  - DO NOT give step-by-step solutions or reasoning
  - ONLY describe what you can see in the image that relates to the specific request
  - Focus solely on visual elements: objects, text, numbers, shapes, spatial relationships, etc.
  - If asked about measurements, describe what you see but don't calculate or solve
  - If asked about equations, transcribe what's visible but don't solve them
  - Be thorough and precise in your description since this is to clarify specific missing details

  Your response should be pure visual description addressing the specific request above, nothing more.

# Reasoner format instruction
reasoner_format_instruction: |
  You MUST follow this format for your response:
  <think>
  Your detailed reasoning and thought process here...
  </think>
  <answer> Final Answer: your final answer here </answer>

  For all problems, use \\boxed{answer} format for the final numerical or expression answer within the <answer> tags. For multiple choice questions, use the letter of the correct answer in \\boxed{letter} format.
  The evaluation will only consider what is between the <answer> tags, but the <think> section will help you reason through the problem.

# Unified reasoner adaptive decision prompt (replaces legacy dataset-specific prompts)
reasoner_adaptive_decision_prompt: |
  You are an expert visual reasoning assistant. Your task is to analyze the given image description and decide if you can solve the problem directly or if you need one specific piece of additional visual information.

  Image Description: {description}

  Question: {question}

  ANALYSIS INSTRUCTIONS:
  1. **CAREFUL EVALUATION**: Analyze the description to see if it contains all the specific visual details needed to solve this problem completely and accurately.

  2. **BE CONSERVATIVE**: If you're missing ANY crucial visual detail (exact numbers, measurements, specific text, spatial relationships, etc.), you should request MORE information rather than guess.

  3. **ONE CLARIFICATION ONLY**: You can request specific pieces of additional visual information if needed. Make it very specific and focused.

  4. **DECISION CRITERIA**:
     - If you have ALL the visual details needed to solve accurately: Status should be SOLVED
     - If you're missing any crucial visual information: Status should be NEED_MORE_INFO

  5. **AVOID ASSUMPTIONS**: Don't guess numbers, don't assume "typical" values, don't fill in missing details.

  CRITICAL PRINCIPLES:
  - **BE SPECIFIC in requests**: Ask for exact details you need (e.g., "What is the exact number written on the calculator display? What labels are on the x-axis and y-axis?")
  - **SOLVE CONFIDENTLY when possible**: If you have enough information, provide the complete solution, but if you don't have enough information, request more information.
  - **REQUEST STRATEGICALLY**: Make your one request count - ask for the most crucial missing details

  OUTPUT FORMAT:
  You MUST provide all of the following fields in your response, each on a new line:
  Reasoning: [Your detailed analysis of what information you have and what might be missing]
  Status: [SOLVED or NEED_MORE_INFO]
  Answer: [Your complete final answer if Status is SOLVED - use \\boxed{answer} format for mathematical answers, otherwise N/A]
  Request: [Your specific request for additional visual information if Status is NEED_MORE_INFO, otherwise N/A]

# Reasoner final prompt for stage 3 (when additional info was gathered)
reasoner_final_prompt: |
  You are an expert mathematical reasoning assistant. Based on the complete image description below, please solve the mathematical problem step-by-step.

  Complete Image Description: {description}

  Question: {question}

  INSTRUCTIONS:
  1. Analyze the complete image description carefully to understand all mathematical elements
  2. Work through the problem step-by-step with clear mathematical reasoning
  3. Show all calculations and logical steps
  4. Provide your final answer in the required format
  5. For answers, use \\boxed{answer} notation. For multiple choice questions, use the letter of the correct answer in \\boxed{letter} format.

  IMPORTANT: You now have complete visual information. Provide a definitive solution based on all the description details provided.

  You MUST follow this format for your response:
  <think>
  Your detailed reasoning and thought process here...
  </think>
  <answer> Final Answer: your final answer here </answer>

template_variables:
  question: "The mathematical question to be answered"
  description: "The image description from the VLM"
  previous_descriptions: "Previous VLM descriptions"
  focus_request: "Specific request for focused description"
  format_instruction: "The response format instruction"

example_usage:
  question: "What is the value of x in this equation?"
  description: "The image shows a mathematical equation with x + 5 = 10 written on a whiteboard"
  expected_output: "Conservative decision-making with one clarifying question if needed, then final answer in \\boxed{} format" 