Transformer-Based Large Language Models Are Not General Learners: A Universal Circuit Perspective

Yang Chen; Yitao Liang; Zhouchen Lin

Transformer-Based Large Language Models Are Not General Learners: A Universal Circuit Perspective

Yang Chen, Yitao Liang, Zhouchen Lin

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Large Language Model, Transformer, Universal Circuit

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We show Transformer-based large language models are not general learners from a perspective of universal circuits.

Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, evoking perceptions of ``sparks of Artificial General Intelligence (AGI)" (Bubeck et al., 2023). A key question naturally arises: *Can foundation models lead to AGI?* In this work, we try to answer this question partially by formally considering the capabilities of Transformer-based LLMs (T-LLMs) from the perspective of universal circuits. By investigating the expressive power of realistic T-LLMs as universal circuits, we show that a T-LLM of size $\operatorname{poly}(n)$ cannot perform all the basic operators of input length $O\left(\operatorname{poly}(\log n)\right)$. We also demonstrate that a constant-depth-$\operatorname{poly}(n)$-size log-precision T-LLM cannot faithfully execute prompts of complexity $n$. Our analysis provides a concrete theoretical foundation that T-LLMs can only be universal circuits for limited function classes, or in other words, T-LLMs are not general learners. Furthermore, we exhibit that a constant-depth-$\operatorname{poly}(n)$-size log-precision T-LLM can memorize $O\left(\operatorname{poly}(n)\right)$ instances, which could partially explain the seeming inconsistency between LLMs' empirical successes and our negative results. To the best of our knowledge, our work takes the first step towards analyzing the limitations of T-LLMs as general learners within a rigorous theoretical framework. Our results promote the understanding of LLMs' capabilities and highlight the need for innovative architecture designs beyond Transformers to break current limitations.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8842

Loading