On the Fundamental Limits of LLMs at Scale

On the Fundamental Limits of LLMs at Scale

28 Nov 2025 (modified: 08 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation, (4) retrieval fragility, and (5) multimodal misalignment. While existing surveys describe these phenomena empirically, they lack a rigorous theoretical synthesis connecting them to the foundational limits of computation, information, and learning. This work closes that gap by presenting a unified, proof-informed framework that formalizes the innate theoretical ceilings of LLM scaling. First, computability and uncomputability imply an irreducible residue of error: for any computably enumerable model family, diagonalization guarantees inputs on which some model must fail, and undecidable queries (e.g., halting-style tasks) induce infinite failure sets for all computable predictors. Second, information-theoretic and statistical constraints bound attainable accuracy even on decidable tasks, finite description length enforces compression error, and long-tail factual knowledge requires prohibitive sample complexity. Third, geometric and computational effects compress long contexts far below their nominal size due to positional under-training, encoding attenuation, and softmax crowding. We further show how likelihood-based training favors pattern completion over inference, how retrieval under token limits suffers from semantic drift and coupling noise, and how multimodal scaling inherits shallow cross-modal alignment. Across sections, we pair theorems and empirical evidence to outline where scaling helps, where it saturates, and where it cannot progress, providing both theoretical foundations and practical mitigation paths like bounded-oracle retrieval, positional curricula, and sparse or hierarchical attention.

Submission Type: Long submission (more than 12 pages of main content)

Changes Since Last Submission: The following changes have been made in the revised manuscript: **Theorem 2 proof corrected (Section 2.1).** The reviewer identified that the original construction $i_k := k \bmod (k+1)$ evaluates to $k$ for all $k \geq 0$, collapsing the result to Theorem 1's single-failure guarantee. We replaced this with a Cantor pairing construction: letting $\pi: \mathbb{N} \to \mathbb{N} \times \mathbb{N}$ be the inverse of $\langle a,b \rangle = \frac{(a+b)(a+b+1)}{2} + a$, we set $i_k := a_k$ where $\pi(k) = (a_k, b_k)$. Since every model index $i$ appears as the first component for infinitely many $k$ (one per $b \in \mathbb{N}$), the construction $f'(s_k) := \mathrm{flip}(h_{i_k}(s_k))$ now correctly forces each model to hallucinate on infinitely many inputs. The theorem statement and all downstream arguments are unchanged. **Formal hallucination definition added (Section 2).** We introduced Definition 1 at the start of Section 2, distinguishing factual hallucination ($h(x) \neq f(x)$), faithfulness hallucination ($h(x \mid C) \not\models C$), and intrinsic hallucination (internal inconsistency). We explicitly scope the impossibility results (Theorems 1-3) to factual hallucination. **Expanded per-section roadmap (Section 1).** The introduction now includes a structured summary of each section (Sections 2-8), with key results and dependencies stated upfront, enabling selective reading. **Sections 5 and 6 tightened.** In Section 5 (Retrieval fragility), we compressed the memory contamination defense discussion and streamlined the pipeline limitations subsection to focus on the core theoretical gap between attention-based fusion and Bayes-optimal intent marginalization. In Section 6 (Multimodal misalignment), we removed the computational inefficiency subsection (generic quadratic attention complexity), compressed the VQ training objective details, the advanced architectures discussion, the post-training alignment subsection, and the promising directions subsection. Both sections now focus more tightly on claims that connect to the paper's theoretical framework. Qualifying language has been added where claims are empirically motivated rather than formally proven.

Assigned Action Editor: ~Yuan_Cao1

Submission Number: 6695

Loading