DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

Yuanhe Zhang; Ilja Kuzborskij; Jason D. Lee; Chenlei Leng; Fanghui Liu

DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

Yuanhe Zhang, Ilja Kuzborskij, Jason D. Lee, Chenlei Leng, Fanghui Liu

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, mathematical reasoning, directed acyclic graphs

TL;DR: We propose a new pipeline by modeling CoT on directed acyclic graphs (DAGs), introduce the concept of logic closeness, and then precisely evaluates the mathematical reasoning ability of LLMs via the proposed DAG-MATH format.

Abstract: Large Language Models (LLMs) achieve strong results on mathematical reasoning tasks with Chain-of-Thought (CoT), yet it remains unclear whether this reflects genuine rule-based reasoning or heuristic search. We propose a framework that models CoT as a stochastic process over directed acyclic graphs (DAGs), where nodes denote intermediate states and edges represent rule applications. Within this setting, we introduce \textbf{logical closeness}, a metric that measures how closely an LLM’s derivation trajectory adheres to the DAG structure, extending evaluation beyond final-answer accuracy (PASS@$k$). To operationalize this, we design the \emph{DAG-MATH} CoT format and construct a benchmark that elicits trajectories in this form, enabling structured evaluation of reasoning. Evaluation on mathematical reasoning datasets reveal statistically significant differences in reasoning fidelity across LLM families-even when final-answer accuracy is comparable-highlighting gaps between final-answer accuracy and rule-consistent inference. Our approach bridges free-form CoT with structured systems and provides actionable diagnostics for evaluating the quality of LLM reasoning.

Submission Number: 107

Loading