Intention Model: A Novel Explanation for In-context Learning

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: In-context learning, Large language models
TL;DR: We propose the intention model, a novel theoretical explanation for ICL.
Abstract: In-context learning (ICL) has demonstrated remarkable success in enabling large language models (LLMs) to learn to do a downstream task by simply conditioning on a few input-output demonstrations. Distinct from traditional learning paradigms, ICL does not require model updates, thus attracting significant interest in understanding the mechanisms behind LLMs’ ICL capabilities. Advanced works aim to understand ICL through an empirical viewpoint to provide the multifaceted nature of ICL, while some works aim to explain how ICL can emerge theoretically. However, the current theoretical analysis exhibits a weak connection to empirical explorations due to strong assumptions, e.g., perfect LLMs and ideal demonstrations. This work proposes an intention model, providing a novel theoretical framework for explaining ICL. With mild assumptions, we present a ``no-free-lunch'' theorem for ICL: whether ICL emerges depends on the prediction error and prediction noise, which are determined by \emph{\textbf{i)}} LLMs' error of next-token prediction, \emph{\textbf{ii)}} LLMs' prediction smoothness, and \emph{\textbf{iii)}} the quality of demonstrations. Moreover, our intention model provides a novel explanation for the learning behavior of ICL under various input-output relations, e.g., learning with flipped labels. This is fortunately consistent with our experimental observations.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9067
Loading