Think in Graphs: Infrastructure and Benchmark for Large Language Model Reasoning Frameworks

08 Sept 2025 (modified: 08 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Benchmark, Graph
Abstract: Enhancing the reasoning ability of Large Language Models (LLMs) has become a central focus of current research. While approaches based on prompt engineering have significantly improved LLM performance, the increasing complexity of reasoning frameworks has led to higher development costs. Moreover, these frameworks often require extensive redesigns to actually work on different tasks, with their performance heavily dependent on these specific designs. This creates challenges in establishing clear and consistent evaluation benchmarks. To address these issues, we propose a unified infrastructure that represents reasoning processes as graphs, thereby standardizing and structuring the reasoning workflow. This approach enables more consistent and efficient implementation of diverse reasoning frameworks, facilitates objective comparisons, and supports deeper analysis through graph algorithms. Building on this infrastructure, we develop an LLM reasoning benchmark and demonstrate its effectiveness through multiple experiments, enabling more comprehensive evaluation and analysis.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 3213
Loading