Keywords: 0-1 laws, first-order logic, probabilistic spaces, finite graphs
TL;DR: We prove, using the first-order logic, that GPT/LLM architecture has inherent limitations in theorem proving and problem solving
Abstract: This paper shows that reasoning/proving issues of $GPT/LLM$ are an inherent logical consequence of the architecture. Namely, they are due to a schema of its prediction mechanism of the next token in a sequence, and randomization involved in the process.
After the natural formalization of the problem into a domain of finite graphs, $G({\omega})$, we prove the following general theorem:
For almost all proofs, any learning algorithm of inference, that uses randomization in $G({\omega})$, and necessitates veracity of inference, is almost surely a literal learning.
In the context, "literal learning" stands for one which is either vacuous, i.e. $\forall x~[P(x) \implies Q(x)]$ where $P(x)$ is false for every $x$, or create a random inference from a false assumption (hallucination), or it essentially memorizes the inferences from training/synthetic data.
A few corollaries follow. For instance, if its formulation is somewhat original, it is easy to notice the issue of solving mathematical problems with $LLMs$ in the case of even low-complexity tasks. Since its solution is unlikely to be found in a holistic form in a training dataset, a correct proof is not to be expected.
It is because, in a rigorous context, $GPT$ has exponentially decreasing odds of finding a valid proof of the result unless it simply “repeats” a known proof, perhaps with trivial modifications. Another observation is that the degradation has an exponential rate by the length of a proof. In other words, an attempt to prove a complex enough statement virtually has no chance to be
successful.
In a novel rigorous context (i.e., when $GPT$-based architecture is looking to prove a new result, for instance, a hypothesis), that is virtually impossible even for a long enough fragment. The probability of success becomes infinitesimal quickly for either a fragment of possible proof or a weaker non-trivial statement. That also was empirically shown for data mixtures and confirmed experimentally.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1160
Loading