Keywords: In-context learning, Auto-regressive next-token prediction, Generalization performance, PAC-Bayesian
Abstract: Large language models (LLMs) have demonstrated remarkable in-context learning (ICL) abilities. However, existing theoretical analysis of ICL primarily exhibits two limitations: \textbf{(a) Limited \textit{i.i.d.} Setting.} Most studies focus on supervised function learning tasks where prompts are constructed with \textit{i.i.d.} input-label pairs. This \textit{i.i.d.} assumption diverges significantly from real language learning scenarios where prompt tokens are interdependent. \textbf{(b) Lack of Emergence Explanation.} Most literature answers \textbf{\textit{what}} ICL does from an implicit optimization perspective but falls short in elucidating \textbf{\textit{how}} ICL emerges and the impact of pre-training phase on ICL. In our paper, to extend (a), we adopt a more practical paradigm, \textbf{\textit{auto-regressive next-token prediction (AR-NTP)}}, which closely aligns with the actual training of language models. Specifically, within AR-NTP, we emphasize prompt token-dependency, which involves predicting each subsequent token based on the preceding sequence. To address (b), we formalize a systematic pre-training and ICL framework, highlighting the layer-wise structure of sequences and topics, alongside a two-level expectation. In conclusion, we present data-dependent, topic-dependent and optimization-dependent PAC-Bayesian generalization bounds for pre-trained LLMs, investigating that \textbf{\textit{ICL emerges from the generalization of sequences and topics}}. Our theory is supported by experiments on numerical linear dynamic systems, synthetic GINC and real-world language datasets.
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7103
Loading