Reasoning Structure of Large Language Models

Frédéric Berdoz; Luca A Lanzendörfer; Fabian Farestam; Roger Wattenhofer

Reasoning Structure of Large Language Models

Frédéric Berdoz, Luca A Lanzendörfer, Fabian Farestam, Roger Wattenhofer

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: Reasoning analysis, large reasoning models, puzzles, reasoning graphs, structural metrics

Abstract: Large reasoning models (LRMs) are often evaluated using metrics such as final-answer accuracy or token count. However, identical scores on these metrics can hide fundamentally different reasoning structures. To address this limitation, we introduce a scalable LRM benchmark of logic puzzles and a pipeline that converts unstructured traces into verifiable reasoning graphs of claims and dependencies. This turns reasoning into a structured, measurable object whose topology can be quantitatively analyzed. Building on this, we define a reasoning efficiency metric that quantifies how concentrated the model's logical flow is. Our analysis on open-source reasoning models shows that structural measurements separate behaviors that token count and accuracy conflate, providing a practical tool for diagnosing failure modes and comparing how reasoning scales with puzzle difficulty.

Presenter: ~Frédéric_Berdoz1

Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 79

Loading