Regularity explains emergence

Yi Wang; Zhiren Wang

Regularity explains emergence

Yi Wang, Zhiren Wang

26 Sept 2024 (modified: 11 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model, emergence ability, approximation, scaling law, regularity

Abstract:

We investigate the mechanisms behind emergence in large language models from the viewpoint of the regularity of the optimal response function $f^$ on the space of prompt tokens. Based on theoretical justification, we provide an interpretation that the derivatives of $f^$ are in general unbounded and the model gives up reasoning in regions where the derivatives are large. In such regions, instead of predicting $f^*$, the model predicts a smoothified version obtained via an averaging operator. The threshold on the norm of derivatives for regions that are given up increases together with the number of parameters $N$, causing emergence. The relation between regularity and emergence is supported by experiments on arithmetic tasks such as multiplication and summation and other tasks. Our interpretation also shed light on why fine-tuning and Chain-of-Thought can significantly improves LLM performance.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8136

Loading