SHAPE of Chain-of-Thought in Math Reasoning

Jonghyun Song; Sangjun Song; Minjae Oh; Haesung Pyun; SungSik Lee; Yohan Jo

SHAPE of Chain-of-Thought in Math Reasoning

Jonghyun Song, Sangjun Song, Minjae Oh, Haesung Pyun, SungSik Lee, Yohan Jo

Published: 17 Jun 2026, Last Modified: 22 Jun 2026ICML 2026 AI4Math Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Code Url: https://github.com/holi-lab/SHAPE-of-CoT

Keywords: Chain-of-Thought Analysis, Reasoning Diagnostics, Heuristic Analysis, Semantic-Space Tracking, Reinforcement Learning from Verifiable Rewards, Post-training

TL;DR: SHAPE analyzes Chain-of-Thought reasoning at the level of mathematical interpretations and heuristics, post-training narrows heuristic diversity, and heuristic guidance can improve RL-based math reasoning.

Abstract: Large language models (LLMs) achieve strong performance on mathematical reasoning benchmarks, yet the mathematically meaningful skills underlying their reasoning remain underexplored. We introduce \texttt{SHAPE}, a framework that analyzes Chain-of-Thought (CoT) trajectories through two lenses developed in mathematics education: (1) semantic spaces: the model's evolving mathematical interpretations of a problem (e.g., algebraic, geometric), and (2) heuristics: the specific mathematical actions taken within those spaces (e.g., simplifying the problem, working backward). We first use \texttt{SHAPE} to analyze the reasoning patterns of various models. Our findings reveal that the mathematical heuristics employed by a model better explain final answer correctness than traditional CoT features. Furthermore, models are likely to reach correct solutions by concentrating their reasoning effort within a few semantic spaces rather than exploring many disparate ones---a pattern consistent with human behavior. Next, we utilize the \texttt{SHAPE} lens to evaluate whether post-training truly enhances mathematical proficiency. We find that reinforcement learning induces mode-seeking in heuristic usage. Lastly, we post-train LLMs by promoting diverse heuristics and demonstrate its effectiveness in improving accuracy. Overall, \texttt{SHAPE} provides a theoretically-grounded diagnostic framework for decoding LLM reasoning and offers a new path toward post-training LLMs for math reasoning. The code for our model is available at \url{https://github.com/holi-lab/SHAPE-of-CoT}

Submission Number: 94

Loading