Sparse Spectral Signatures of Reasoning: Model-Agnostic Verification via Sentence- Level Graph Signals

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: spectral graph theory, chain-of-thought verification, reasoning verification, graph signal processing, model-agnostic evaluation, LLM reasoning
TL;DR: Spectral metrics from sentence-level semantic graphs built solely from chain-of-thought text discriminate correct from incorrect LLM reasoning across domains and models—including closed-source ones—without model internals
Abstract: Recent work has shown that spectral properties of internal attention graphs can distinguish valid from invalid mathematical reasoning in LLMs. How- ever, attention-based methods require access to model weights, exclud- ing closed-source models and production deployments. We investigate whether analogous spectral signatures exist in external sentence-level se- mantic graphs constructed solely from chain-of-thought text. We construct cosine-similarity threshold graphs over sentence embeddings and compute spectral metrics from the graph Laplacian—requiring only black-box text output. Across 2,400 traces spanning three reasoning domains (mathemat- ical, first-order logic, deductive) and four model architectures—including the closed-source Claude Sonnet 4—we find that spectral metrics reliably discriminate correct from incorrect reasoning, with 9 of 12 domain-model conditions significant at p<0.05 (AUC up to 0.77). Spectral features add up to +14.9% AUC over text-level baselines, with the largest gains when base- lines are weakest—demonstrating that spectral analysis captures structural reasoning properties orthogonal to surface text quality.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 74
Loading