Transformer-Based Large Language Models Are Not General Learners: A Universal Circuit Perspective

Published: 27 Oct 2023, Last Modified: 24 Apr 2024ICBINB 2023EveryoneRevisionsBibTeX
Keywords: Large Language Model, Transformer, Universal Circuit
TL;DR: We show Transformer-based large language models are not general learners from a perspective of universal circuits.
Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, evoking perceptions of ``sparks of Artificial General Intelligence (AGI)". A key question naturally arises: *Can foundation models lead to AGI?* In this work, we try to answer this question partially by formally considering the capabilities of Transformer-based LLMs (T-LLMs) from the perspective of universal circuits. By investigating the expressive power of realistic T-LLMs as universal circuits, we show that a T-LLM of size $\operatorname{poly}(n)$ cannot perform all the basic operators of input length $O\left(\operatorname{poly}(\log n)\right)$. We also demonstrate that a constant-depth-$\operatorname{poly}(n)$-size log-precision T-LLM cannot faithfully execute prompts of complexity $n$. Our analysis provides a concrete theoretical foundation that T-LLMs can only be universal circuits for limited function classes. In other words, T-LLMs are not general learners. Furthermore, we exhibit that a constant-depth-$\operatorname{poly}(n)$-size log-precision T-LLM can memorize $O\left(\operatorname{poly}(n)\right)$ instances, which could partially explain the seeming inconsistency between LLMs' empirical successes and our negative results. To the best of our knowledge, our work takes the first step towards analyzing the limitations of T-LLMs as general learners within a rigorous theoretical framework. Our results promote the understanding of LLMs' capabilities and highlight the need for innovative architecture designs beyond Transformers to break current limitations.
Submission Number: 15