version: "v1"
created: "2025-01-30T00:00:00"
description: "Two-stage reasoning prompts matching original TwoStageModel implementation"
scaffold_type: "two_stage"
git_hash: null
experiment_id: null
author: "<ANONYMIZED>"

# VLM prompts for image description and verification
vlm_description_prompt: |
  I need your help analyzing this image to prepare for answering the following question: 

  {question}

  IMPORTANT: DO NOT answer the question directly. Instead, provide a comprehensive and detailed description of everything visible in the image that could be relevant for answering this question.

  Focus on describing:
  - All objects, people, text, and visual elements in the image
  - Spatial relationships between different elements
  - Any text content that is visible, transcribed exactly
  - Colors, shapes, patterns, and visual attributes
  - Relevant contextual details and background information

  Your description should be detailed enough that someone could mentally reconstruct the image without seeing it, but DO NOT provide step-by-step instructions on how to recreate it.

vlm_verification_prompt: |
  I will provide an image, an image description, and the question this description is intended to help answer.

  Original Question: "{question}"

  Image Description: {description}

  Your task is to critically evaluate the **Image Description** ONLY for its accuracy and completeness in representing the visual elements of the image that are relevant to the **Original Question**.

  **DO NOT ANSWER THE ORIGINAL QUESTION.** Your sole focus is the quality of the image description itself.

  Please perform the following steps:
  1. Compare the **Image Description** to the image.
  2. Identify any inaccuracies or missing visual details in the description that would be important for answering the **Original Question**.
  3. If the **Image Description** is perfect and needs no changes, respond ONLY with the exact phrase: "The description is accurate and complete."
  4. If the **Image Description** needs improvement, provide a revised and more accurate/complete description. Start your response ONLY with "Improved description:" followed by the full, corrected description. Do not include any other text or explanations, and specifically, do not include an answer to the **Original Question** in your response.

  Remember, your output should be EITHER the exact phrase "The description is accurate and complete." OR "Improved description: [full corrected description here]". Nothing else.

# Reasoner prompts for different dataset types
reasoner_format_instruction: |
  You MUST follow this format for your response:
  <think>
  Your detailed reasoning and thought process here...
  </think>
  <answer> Final Answer: your final answer here </answer>

  The evaluation will only consider what is between the <answer> tags, but the <think> section will help you reason through the problem.

reasoner_mcq_prompt: |
  Based on the following image description, please answer the question:

  Image Description: {description}

  Question: {question}

  Please provide your answer by selecting the option letter (A, B, C, etc.) that corresponds to the correct answer.
  {format_instruction}

reasoner_math_prompt: |
  Based on the following image description, please answer the question:

  Image Description: {description}

  Question: {question}

  Please work through this step-by-step to solve the problem.
  {format_instruction}

reasoner_detail_prompt: |
  Based on the following image description, please answer the question:

  Image Description: {description}

  Question: {question}

  Please provide a detailed answer.
  {format_instruction}

reasoner_default_prompt: |
  Based on the following image description, please answer the question:

  Image Description: {description}

  Question: {question}

  Please provide a clear and concise answer.
  {format_instruction}

template_variables:
  question: "The question to be answered about the image"
  description: "The image description from the VLM"
  format_instruction: "The response format instruction"

example_usage:
  question: "What is the value of x in this equation?"
  description: "The image shows a mathematical equation with x + 5 = 10 written on a whiteboard"
  expected_output: "Step-by-step solution with final answer in the required format" 