Understanding the Transient Nature of In-Context Learning: The Window of Generalization

Published: 10 Oct 2024, Last Modified: 09 Nov 2024SciForDL PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: In-Context Learning can be transient when there is a better solution solving only the training distribution.
Abstract: In-Context Learning (ICL) is one of the main mechanisms driving few shot learning capabilities of large language models (LLMs). A rich literatures explores the causal factors giving rise to this mechanism, while recent studies has pointed out that this mechanism can be transient. In this work, we study ICL on a synthetic task consisting of a probabilistic mixture of Markov chains which is simple enough to allow theoretical analysis yet rich enough to reproduce multiple phenomena discussed in the ICL literature. Here, we focus on analyzing the transient nature of ICL using this setup and elucidate the role of data and model training using a mechanistic phase diagram. Our findings conclude to: 1) A certain data diversity is required for ICL 2) a non-generalizing Bayesian solution might arise later in training if its circuit complexity is higher. We conclude: ICL, or any other generalizing solution, is subject to transience if there exists a better solution narrowly fitting the training distribution accessible by gradient descent.
Style Files: I have used the style files.
Submission Number: 74
Loading