On inherent limitations of GPT/LLM \\ Architecture

Serge Berger

On inherent limitations of GPT/LLM \\ Architecture

Serge Berger

16 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: 0-1 laws, first-order logic, probabilistic spaces, finite graphs

TL;DR: We prove, using the first-order logic, that GPT/LLM architecture has inherent limitations in theorem proving and problem solving

Abstract: This paper shows that reasoning/proving issues of $GPT/LLM$ are an inherent logical consequence of the architecture. Namely, they are due to a schema of its prediction mechanism of the next token in a sequence, and randomization involved in the process. After the natural formalization of the problem into a domain of finite graphs, $G({\omega})$, we prove the following general theorem: For almost all proofs, any learning algorithm of inference, that uses randomization in $G({\omega})$, and necessitates veracity of inference, is almost surely a literal learning. In the context, "literal learning" stands for one which is either vacuous, i.e. $\forall x~[P(x) \implies Q(x)]$ where $P(x)$ is false for every $x$, or create a random inference from a false assumption (hallucination), or it essentially memorizes the inferences from training/synthetic data. A few corollaries follow. For instance, if its formulation is somewhat original, it is easy to notice the issue of solving mathematical problems with $LLMs$ in the case of even low-complexity tasks. Since its solution is unlikely to be found in a holistic form in a training dataset, a correct proof is not to be expected. It is because, in a rigorous context, $GPT$ has exponentially decreasing odds of finding a valid proof of the result unless it simply “repeats” a known proof, perhaps with trivial modifications. Another observation is that the degradation has an exponential rate by the length of a proof. In other words, an attempt to prove a complex enough statement virtually has no chance to be successful. In a novel rigorous context (i.e., when $GPT$-based architecture is looking to prove a new result, for instance, a hypothesis), that is virtually impossible even for a long enough fragment. The probability of success becomes infinitesimal quickly for either a fragment of possible proof or a weaker non-trivial statement. That also was empirically shown for data mixtures and confirmed experimentally.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1160

Loading