A Theoretical Analysis of In-context Task Retrieval and Learning

Ziqian Lin; Kangwook Lee

A Theoretical Analysis of In-context Task Retrieval and Learning

Ziqian Lin, Kangwook Lee

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: In-context Learning, Task Learning, Task Retrieval, Bayesian Inference, Noisy Linear Regression

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: The study theorizes two modes of in-context learning including "task learning" and "task retrieval", investigating the influence of pre-training dataset noise via loss upper bound, Bayesian simulations, and practical Transformer evaluations.

Abstract: In-context learning (ICL) can be used for two different purposes: task retrieval and task learning. Task retrieval focuses on recalling a pre-trained task using examples from the task that closely approximates the target pre-trained task, while task learning involves learning a task using in-context examples. To rigorously analyze these two modes, we propose generative models for both pretraining data and in-context samples. Assuming we use our proposed models and consider the mean squared error as a risk measure, we demonstrate that in-context prediction using a Bayes-optimal next-token predictor equates to the posterior mean of the label, conditioned on in-context samples. From this equivalence, we derive risk upper bounds for in-context learning. We reveal a unique phenomenon in task retrieval: as the number of in-context samples increases, the risk upper bound decreases initially and then increases subsequently. This implies that more in-context examples could potentially worsen task retrieval. We validate our analysis with numerical computations in various scenarios and validate that our findings are replicable in the actual Transformer model implementation.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3000

Loading