In-Context Benign Overfitting: A Feature-Selection Model in In-Context Linear Regression

Puneesh Deora; Bhavya Vasudeva; Christos Thrampoulidis

In-Context Benign Overfitting: A Feature-Selection Model in In-Context Linear Regression

Puneesh Deora, Bhavya Vasudeva, Christos Thrampoulidis

Published: 02 Mar 2026, Last Modified: 02 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: in-context learning, linear regression, benign overfitting, linear attention, transformers

Abstract: In in-context learning (ICL), a frozen pre-trained model solves tasks by conditioning on a prompt of a few input–output examples, without gradient updates. If the task was present in pretraining but the particular prompt sequence was not, the resulting in-distribution generalization is retrieval-based ICL. Learning-based ICL instead reflects out-of-distribution generalization: the model succeeds on prompts generated by a novel task. Empirically, both forms improve with scale. By analogy to benign overfitting in supervised learning, we call this in-context benign overfitting: larger models more faithfully memorize the pretraining tasks (improving retrieval ICL) while also generalizing better to novel tasks (improving learning ICL). We prove that this phenomenon already arises in a minimal in-context linear-regression feature-selection model. In contrast, standard in-context linear-regression models exhibit a retrieval–learning tradeoff, where the emergence of learning-based ICL coincides with degraded retrieval-based performance.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Style Files: I have used the style files.

Submission Number: 80

Loading