DATA_CONSTRUCTION_USER_PROMPT: str = """
You are a helpful AI assistant. Initially, when solving a question, you would need to think step by step, without the ability to use code for calculation or logical verification. Now, you have enhanced capabilities to write code for both computational tasks and logical reasoning verification. The code will be executed by a sandbox, and the result can be returned to enhance your reasoning process.

**Important Note on Code Usage**: You now have two parallel tools to enhance your reasoning:
1. **Python calculation code** - for numerical computations, data processing, and mathematical operations
2. **Z3 logical verification** - for verifying logical reasoning, constraints, and formal proofs using the Z3 theorem prover


These are complementary tools serving different purposes. Use calculation code for computational problems and Z3 for logical verification of reasoning steps. **Do not use both for the same problem** - choose the most appropriate tool based on whether you need computation or logical verification.

The thinking process can have multiple code snippets. Each code snippet is wrapped with:
<code>
```python
code snippet
```
</code>, and should be executable. The returned result is wrapped with <interpreter> execution results </interpreter>.

Critical: Code Independence Requirement
Each code snippet must be completely self-contained and executable independently. This means:

- Each code block should include all necessary imports
- Each code block should define all variables it uses (do not rely on variables from previous code blocks)
- Each code block should be able to run successfully if executed in isolation
- If you need values from previous calculations, redefine or recalculate them in the new code block
- Think of each <code> block as being executed in a fresh Python environment

**Guidelines for Z3 Usage:**

Z3 verification should ONLY be used when it provides genuine formal verification value:

Do NOT use Z3 for:

- Verifying simple arithmetic calculations (e.g., 2 + 2 = 4, or 1.0 × (−2.0 − 4.0) = −6.0)
- Checking calculations with concrete numbers that Python already computed
- Adding concrete values as constraints and then verifying them (this is circular reasoning)
- Repeating what numerical computation already verified

DO use Z3 for:

- Proving general mathematical properties or identities that hold for ALL values (using symbolic variables)
- Verifying complex logical relationships with multiple interrelated constraints
- Checking satisfiability of constraint systems or finding whether solutions exist
- Proving inequalities or optimization conditions symbolically
- Verifying that no counterexamples exist for a general claim
- Formal verification of logical reasoning steps that involve symbolic relationships
- Key principle: Use symbolic variables (not concrete numbers) to prove general statements. If the problem only involves concrete arithmetic with specific numbers, skip Z3 verification entirely.

Examples of INCORRECT Z3 usage:
```python
# BAD: Verifying concrete arithmetic
solver.add(m == 1.0)
solver.add(v_i == 4.0)
solver.add(delta_p == -6.0)
solver.check()  # This just checks if -6.0 == -6.0, no value added
```
Examples of CORRECT Z3 usage:

```python
# GOOD: Proving general algebraic equivalence
from z3 import *  # Remember: each code block needs its imports
m, v1, v2 = Reals('m v1 v2')
solver = Solver()
solver.add(m > 0)  # General constraint, not specific value
# Prove: Δp = m(v2 - v1) is equivalent to Δp = m·v2 - m·v1 for ALL values
solver.add(m * (v2 - v1) != m * v2 - m * v1)
result = solver.check()
print("Verification result:", result)
assert result == unsat  # Proves no counterexample exists
```

**Guidelines for Python code calculatation Usage:**

When NOT to use code:

- Simple arithmetic that can be done mentally
- Basic algebra or formula substitution
- Straightforward unit conversions
- Verifying obvious mathematical identities
- Problems where all steps are elementary calculations

Code Usage Limit:

- For problems solvable with basic math: Use code AT MOST 1-2 times (or not at all)
- For complex computational problems: Use code AT MOST 3-4 times
- Each code block should serve a distinct, necessary purpose
- **Never use multiple code blocks to verify the same calculation in different ways**


Goal:
Modify the original thinking process to make it more accurate by:

- Replacing manual calculation steps with Python code snippets and their execution results
- Adding Z3 logical verification when it provides genuine formal verification beyond simple arithmetic
- Keeping the core reasoning logic intact, including any unsuccessful attempts
- Adding code only where it provides genuine value
- Ensuring each code snippet serves a unique, necessary purpose 
- use Python code or Z3 verification for a combined total of no more than 4 times
- Wrap the revised thinking process within <revised_thinking_process> and </revised_thinking_process>.

User Question:
{question}

Original Thinking Process (without code interpreter's support):
<original_thinking_process> {original_response} </original_thinking_process>

Details:

1. Identify sections where Python code execution could speed up reasoning or make calculations more accurate
2. Identify logical reasoning blocks that would benefit from Z3 formal verification (general properties, not specific calculations)
3. Replace manual calculation steps with code snippets and corresponding interpreter's execution results
4. Each code snippet must be self-contained with all necessary imports and variable definitions
5. Keep the logical flow of the reasoning process intact, including any failed exploration attempts
6. Code snippets should be complete scripts that can execute independently, including necessary imports, without markdown symbols
7. For Z3 verification, always use "from z3 import *" at the beginning of each Z3 code block
8. Outputs in code snippets must explicitly call the print function
9.Each code snippet must be immediately followed by its execution result, enclosed in <interpreter></interpreter> tags
10.Execution results should match the model's output exactly, with no extra or missing tokens
11. Z3 should prove general properties, not verify specific numerical results that Python already computed
12. If Z3 would only repeat what Python arithmetic already verified, omit it entirely
13. Remember: variables defined in one code block are NOT available in subsequent code blocks - redefine them as needed
14. When performing calculations, format numerical outputs to 2-4 decimal places using round() or f-strings (e.g., print(f"{{result:.2f}}")) to avoid displaying unnecessary floating-point digits. Choose precision appropriate to the context—sufficient for subsequent reasoning.

Revised Thinking Process (With independent selective Python computation blocks and Z3 formal verification):
"""

 

JUDGE_SYSTEM_PROMPT: str = """
You are a precise answer validator. Your task is to determine if a student's answer is equivalent to the ground truth answer.

Guidelines:
1. Do not solve the problem - only compare the given answers
2. Check for mathematical equivalence, not just exact string matches
3. Consider different valid formats (e.g., 1/2 = 0.5 = 50%)
4. Ignore minor formatting differences (spaces, capitalization)
5. For algebraic expressions, check if they reduce to the same result

Output Format:
- If answers are equivalent: 
[[FINAL DECISION]] Yes
- If answers are not equivalent:
[[FINAL DECISION]] No
- No additional explanation or commentary needed
"""

JUDGE_USER_PROMPT: str= """ 
[[QUESTION]]
{question}
[[GROUND TRUTH]]
{ground_truth}
[[STUDENT ANSWER]]
{student_answer}
"""


# consistency check
CONSISTENCY_CHECK_PROMPT: str="""
You are a consistency checker for chain-of-thought reasoning processes that include code execution results.

**Context**: 
The interpreter execution results (content within <interpreter></interpreter> tags) have been manually replaced with actual execution outputs. Your task is to identify and fix any inconsistencies that arose from these replacements.

**Your task**:
Review the provided thinking process and verify that the reasoning context remains consistent with the actual interpreter execution results.

**Specifically check for inconsistencies caused by interpreter result changes**:

1. **Mismatched Variable References**: 
   - The text references a calculated value that doesn't match what's in the <interpreter> block
   - Example: Text says "we get x = 5" but <interpreter> shows x = 7

2. **Incorrect Subsequent Calculations**:
   - Later reasoning uses old/wrong values instead of the actual interpreter output
   - Example: Interpreter shows result is 3.14, but next step uses 3.0

3. **Contradictory Conclusions**:
   - Conclusions or intermediate steps contradict the actual interpreter results
   - Example: Interpreter shows "unsat" but reasoning claims "satisfiable"

4. **Outdated Narrative**:
   - Descriptive text doesn't align with what the code actually computed
   - Example: Text says "calculation yields approximately 2" but interpreter shows 2.7853

**Output Format**:

If ALL reasoning is CONSISTENT with the interpreter results:
- Output exactly: `No`
- Do not output anything else

If you find ANY inconsistencies caused by interpreter result changes:
- Revise the thinking process to align with the actual interpreter outputs
- Output the complete revised thinking process in this exact format:

<consist_thinking>
[Complete revised thinking process with all inconsistencies corrected to match interpreter results]
</consist_thinking>

**Revision Guidelines**:
- Keep all <code> and <interpreter> blocks exactly as provided (DO NOT modify them)
- Only modify the narrative/reasoning text to match the interpreter outputs
- Ensure every reference to a calculated value matches the interpreter result
- Maintain the overall reasoning structure and approach
- Update conclusions if they were based on incorrect interpretation of results
- Ensure that the final answer remains unchanged

**Input**:

User Question:
{question}

Thinking Process to Check:
{thinking_process}

**Your analysis**:
"""

