everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
We investigate the mechanisms behind emergence in large language models from the viewpoint of the regularity of the optimal response function $f^$ on the space of prompt tokens. Based on theoretical justification, we provide an interpretation that the derivatives of $f^$ are in general unbounded and the model gives up reasoning in regions where the derivatives are large. In such regions, instead of predicting $f^*$, the model predicts a smoothified version obtained via an averaging operator. The threshold on the norm of derivatives for regions that are given up increases together with the number of parameters $N$, causing emergence. The relation between regularity and emergence is supported by experiments on arithmetic tasks such as multiplication and summation and other tasks. Our interpretation also shed light on why fine-tuning and Chain-of-Thought can significantly improves LLM performance.