Topological Complexity of Reasoning Chains in Large Language Models: Persistent Homology of Attention Manifolds
Keywords: Large language models, chain-of-thought, persistent homology, attention mechanisms, transformer, Betti numbers, reasoning capacity, topological data analysis, Vietoris-Rips complex
TL;DR: Persistent homology of attention matrices bounds reasoning chain length and predicts accuracy; topology-guided decoding improves CoT by 4--7\%.
Abstract: We introduce a topological framework for analyzing and predicting the reasoning capabilities of large language models. By constructing simplicial complexes from attention weight matrices across transformer layers, we compute persistent homology invariants (Betti numbers $\beta_0, \beta_1, \beta_2$) that capture the structural complexity of information flow during chain-of-thought reasoning. Our main theoretical contribution is a Topological Reasoning Capacity Theorem: for a transformer with $L$ layers and $H$ attention heads, the maximum reasoning chain length it can faithfully represent is bounded by the total persistence $\Pi_L = \sum_{k=1}^L \sum_{i} \big|\text{death}(\sigma_i^k) - \text{birth}(\sigma_i^k)\big|$ of the attention filtration. We prove this bound is tight up to logarithmic factors. Empirically, we demonstrate on GSM8K, MATH, and ARC-Challenge that (i) topological complexity strongly correlates ($r^2 \geq 0.87$) with reasoning accuracy, (ii) ``reasoning collapse'' in long chains corresponds to homological dimension reduction, and (iii) our topology-guided decoding strategy improves chain-of-thought accuracy by 4--7\% across GPT-4, LLaMA-70B, and Mistral-7B without additional training. Our framework provides the first mathematically rigorous characterization of what makes some reasoning chains succeed and others fail.
Submission Number: 148
Loading