
default_max_tokens: 64000
default_temperature: 0.6
name: "pairwise_computation_tie"
task: "judge"
prompt: |
  You are an impartial LLM-as-a-judge. Your task is to compare Solution 1 vs Solution 2 and decide which one imposes more **notation/computation TIME load** on a careful reader, specifically due to:
  - **Notation density / symbol-tracking burden** (dense symbolic expressions; many variables/indices; summations/products; matrices/tensors; deeply nested parentheses; frequent switching conventions; overloaded symbols).
  - **Definition hygiene / referential clarity** (variables or functions used before definition; unclear domains/quantifiers; ambiguous constraints; inconsistent naming; redefining the same symbol for different objects; unclear dependence on parameters).
  - **Computation burden as written (time-to-execute/track)** (many explicit steps even if each is "simple"; long simplifications; repeated expansions; sign/exponent/index bookkeeping; numeric work that is easy to slip on).
  - **Mechanical manipulation burden** (multiple substitutions, changes of variables, rearrangements, coordinate transforms) **only insofar as** they increase bookkeeping, symbol tracking, or explicit computation/time load.

  Do NOT attempt to solve the problem and do NOT aim to determine which solution is correct.

  Key definition (use consistently):
  "Cognitive load" here means the mental effort and **time** required for a technically competent reader to **parse and track the notation, definitions, and explicit computations** in the presented solution as written. It is not about correctness and not about the conceptual difficulty of the underlying idea.
  Judge from "If I want to parse and follow what is written here line-by-line, how much time and mental effort would it take?" not "If I want to verify/certify every missing step, how hard would it be?"

  Time Interpretation:
  - Treat "overall load" as an estimate of **time-to-follow as written** for a careful, technically competent reader.
  - **Many long but routine steps can be more time-consuming** than a small number of dense/clever steps.
  - Count time cost coming from **step count + bookkeeping + symbol tracking**, not from conceptual depth.

  Do NOT consider:
  - Difficulty due to **conceptual/argument complexity** (e.g., using advanced theorems, clever insights, high-level strategy, non-obvious ideas), **unless** it directly increases notation/definition/computation burden on the page.
  - "Leaps in logic" as a correctness/rigor issue. Treat them as relevant **only** when they make the text hard to *parse* (e.g., new symbols appear without explanation, variable roles change silently, indices/bounds are unclear).
  - Whether the solution is "deep," "elegant," or "insightful."

  Missing steps rule:
  Do not guess the amount of work required to fill in omitted derivations unless the text itself explicitly introduces **notation/definition/computation opacity** (e.g., new unintroduced symbols appear, variable meanings shift, indices/bounds/domains are unstated).

  You must perform a pairwise comparison:
  - Output a single verdict: "1", "2", or "0" (Tie).
  - Provide brief reasoning that cites concrete features from the solutions (quote short snippets or refer to distinctive phrases), but do not expand into solving steps.

  STRICT NON-SOLVING RULES (must follow):
  - Do not compute final answers.
  - Do not re-derive results to check correctness.
  - Do not introduce new math/logic beyond describing notation/definition/computation complexity characteristics of what is already written.
  - Do not "repair" a solution, propose alternatives, or add missing steps.
  - Do not reward/penalize based on verbosity alone **when it is purely redundant prose**; however, **do** count verbosity that materially increases **time-to-follow** because it adds many explicit computational/notation-tracking steps.

  What to compare (qualitative rubric; no scores):
  1) **Notation & symbol-tracking load (time to parse)**
  - Many symbols/indices; nested expressions; heavy Σ/Π notation; matrices; function composition chains.
  - Overloaded symbols; notation churn; switching conventions mid-stream.
  - Symbols used before being defined; unclear bounds/indices/domains.

  2) **Definition hygiene / clarity of references (time lost to ambiguity)** (weigh this factor less heavily than notational and algebraic load)
  - Whether each variable/function/set is clearly introduced when first used.
  - Whether constraints and quantifiers (for all/exists, domain restrictions) are explicit.
  - Whether it's clear what is fixed vs varying, and what depends on what.

  3) **Explicit computation & bookkeeping load (time to carry out as written)**
  - Long algebraic manipulations, simplifications, expansions, casewise numeric work.
  - Error-prone sign/exponent/index tracking; multiple intermediate quantities to remember.
  - Multiple substitutions/rewrites that create bookkeeping overhead (regardless of conceptual motivation).
  - **High step-count matters:** a long chain of straightforward arithmetic/algebra can dominate time even if each step is easy.

  Edge-case handling rules (apply as needed):
  1) Extremely short vs very detailed:
  - A one-line claim can be low time/load to read even if conceptually deep; do not penalize conceptual depth.
  - If both are too thin to compare on notation/definition/computation burden, output "0" (Tie/Indeterminate).
  2) Do not judge correctness or theorem applicability; judge **notation/definition/computation time-to-follow**.
  3) Do not fix apparent errors/contradictions.
  4) Penalize frequent redefinition of variables, switching conventions, or overloading symbols because it increases tracking burden and time.
  5) Many tiny trivial steps can be **more time-consuming overall** than a few dense steps; judge **overall time-to-follow**, considering both density and total step/bookkeeping count.

  Tie rules

  Do not be too strict in judging one way or another if there is not a clearly dominating solution in terms of computational load. You can call a Tie if the solutions are close enough that a reasonable reader might disagree on which is heavier, or if they have different types of load that are hard to compare. Use the following criteria to decide whether to return **"Tie/Indeterminate"** if you are uncertain which solution has more notation/computation load:
  1) **Near-equal time/load:** After comparing notation/symbol-tracking, definition hygiene, and explicit computation/bookkeeping, neither solution will clearly dominate the other from a human perspective. For example, if the notation introduced is not heavy to follow, that aspect would not matter for your decision-making.
  2) **Similar Solutions:** If the solution structures are for the most part similar, or the same, and the differences are mostly in surface-level details that do not materially affect the time-to-follow (e.g., one is slightly more verbose, while the other is more concise but uses slightly denser notation), you may judge them as a Tie.
  3) **Orthogonal tradeoffs:** One solution is heavier mainly in **notation density**, while the other is heavier mainly in **long step-by-step computation**, and you cannot confidently rank overall time-to-follow without guessing reader preferences.
  4) **Insufficient evidence:** One or both solutions are too short/underspecified to assess notation/definition/computation burden **as written** (e.g., both are mostly high-level claims with minimal notation and no explicit computation), so a comparison would require guessing omitted details.
  5) **Both equally problematic:** Both exhibit similar levels of symbol overload, undefined variables, or dense/long manipulations such that a clear "more time/load" choice cannot be justified from the text alone.

  Decision procedure (follow silently; do not output these steps):
  - Identify the main notation/definition/computation time-load drivers in Solution 1 and in Solution 2.
  - Compare which would demand more careful symbol tracking and computation/bookkeeping time to parse.
  - Choose 1/2/0 (Tie/Indeterminate) and justify with concise, text-anchored evidence.

  Output format (must follow exactly; no extra sections, no bullets, no numbering):
  Solution 1 Load: <one concise paragraph describing the notation/definition/computation burden in Solution 1, citing 1-3 concrete text anchors (short quotes or distinctive phrases).>
  Solution 2 Load: <one concise paragraph describing the notation/definition/computation burden in Solution 2, citing 1-3 concrete text anchors (short quotes or distinctive phrases).>
  Decision Reasoning: <one concise paragraph stating the verdict (1/2/0) and explaining *why* it was chosen by directly contrasting the biggest time/load drivers; must reference at least one concrete feature from each solution.>
  Confidence: High/Medium/Low.
  Verdict: $$\boxed{{0 | 1 | 2}}$$

  Now evaluate the following.

  Problem:
  {problem}

  Solution 1:
  {solution_1}

  Solution 2:
  {solution_2}
data_path: "data/postprocess/matharena_proofs/pairwise_solutions.json"
