Keywords: In-Context Learning
TL;DR: This paper explains the dual operating modes of ICL and existing unclear phenomena.
Abstract: In-context learning (ICL) exhibits dual operating modes: ***task learning***, i.e., acquiring a new skill from in-context samples, and ***task retrieval***, i.e., locating and activating a relevant pretrained skill. Recent theoretical work proposes various mathematical models to analyze ICL, but they cannot fully explain the duality. In this work, we analyze the dual operating modes leveraging assumptions on the pretraining data. Based on our analysis, we obtain a quantitative understanding of the two operating modes of ICL. We first explain an unexplained phenomenon observed with real-world large language models (LLMs), where the ICL risk initially increases and then decreases with more in-context examples. We also analyze ICL with biased labels, e.g., zero-shot ICL, where in-context examples are assigned random labels, and predict the bounded efficacy of such approaches. We corroborate our analysis and predictions with extensive experiments with real-world LLMs.
Submission Number: 63
Loading