Diagnosing and Mitigating Structural Pathologies in Mathematical Reasoning of Large Language Models

Diagnosing and Mitigating Structural Pathologies in Mathematical Reasoning of Large Language Models

ACL ARR 2026 January Submission9613 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mathematical Reasoning, Graph-Structured Reasoning, Chain-of-Thought, Reinforcement Learning, Step-level Optimization

Abstract: Despite recent progress, large language models (LLMs) for mathematical reasoning often exhibit fragile behaviors, where correct answers are produced despite invalid or incoherent intermediate reasoning. We identify two recurring structural pathologies in Chain-of-Thought (CoT) reasoning: disconnected steps, where intermediate results are not reused, and weak logical flow, where steps are loosely or incorrectly linked yet still yield correct answers. These failures are difficult to address under outcome-only supervision. To mitigate these issues, we propose the Graph-structured Stepwise Reasoning Framework (GSRF), which reformulates implicit CoT into a Graph-structured Stepwise CoT (GS-CoT) that makes inter-step dependencies explicit. Building on this structure, we introduce Graph-guided Group Relative Policy Optimization (G-GRPO), incorporating process-level rewards that encourage step reuse and alignment with the final answer. Extensive experiments on both textual and multimodal mathematical reasoning benchmarks demonstrate that GSRF achieves competitive performance while producing more faithful, coherent, and structurally grounded reasoning traces.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Mathematical Reasoning, Graph-Structured Reasoning, Chain-of-Thought, Reinforcement Learning, Step-level Optimization

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 9613

Loading