Track: tiny / short paper (up to 4 pages)
Keywords: Mathematical Reasoning, Equation-Only Supervision, Symbolic Reasoning, Math Word Problems
TL;DR: The paper describes a math word problem with equation only supervision: numeric (16−3−4 = 9), symbolic (v0−v1−v2 = v3), and semantic variables (eggs laid per day - eggs eaten for breakfast = eggs sold at market)
Abstract: While Large Language Models excel at mathematical reasoning with Chain-of-Thought prompting, their ability to perform systematic arithmetic reasoning without natural language scaffolding remains poorly understood. We investigate equation-only supervision, where LLMs map natural language problems directly to symbolic equation sequences without intermediate explanations. This approach
separates reasoning structure generation from arithmetic computation, enabling compact equation storage and deterministic evaluation by external symbolic systems.
We fine-tune LLaMA 3.1 Instruct 8B on GSM8K across three representations: numeric (16−3−4 = 9), symbolic (v0−v1−v2 = v3), and semantic variables (eggs laid per day - eggs eaten for breakfast = eggs sold at market). Numeric equations achieve 67.85% accuracy on GSM8K with strong generalization (63.68% on GSM-Symbolic), while semantic variables perform comparably (66.41%). Surprisingly, pure symbolic variables underperform significantly (52.46%), revealing that semantic grounding is crucial for learning equation structures. Our dual evaluation metrics show equation-calculated accuracy often matches or exceeds LLM-calculated accuracy, indicating that improving structure generation—not arithmetic computation—remains the primary challenge. This diagnostic study
provides empirical insights into LLMs’ structured mathematical reasoning capabilities with implications for building reliable systems leveraging symbolic computation.
Presenter: ~Jonathan_Chung2
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 83
Loading