Unpacking In-Context Learning: Underlying Mechanism and Out-of-Distribution Generalization via Blended Training on Function Mixture
Keywords: In-Context Learning, Blended Learning, Function Mixture, Function Selection, OOD Generalization
TL;DR: Training transformers on blended prompts from multiple function classes fosters flexible pattern recognition, enhanced noise robustness, and improved out-of-distribution generalization, reducing reliance on single-function selection.
Abstract: Transformer-based language models have achieved remarkable success across a wide range of real-world tasks, yet the internal mechanisms that govern their behavior remain only partially understood. Recent research has increasingly focused on the phenomenon of in-context learning (ICL) and its ability to generalize beyond the training distribution. However, many of these studies are conducted under simplified conditions, where both training and evaluation use prompts derived from a single, clearly defined function. As a result, it remains unclear how models behave in more structurally diverse or ambiguous settings.
In this study, we examine ICL under a blended training paradigm, in which each training prompt contains examples sampled from multiple function classes, without any explicit task identifiers or structural signals. Using standard ICL benchmarks such as linear and quadratic classification, we assess how this training approach influences model behavior, robustness, and generalization.
Our findings indicate that under blended training, the commonly observed function selection behavior, where the model implicitly identifies and applies a single underlying function, plays a less central role. Instead, the model demonstrates more flexible pattern recognition, improved resilience to input noise, and stronger generalization to out-of-distribution tasks. These results suggest that training on structurally mixed prompts can enhance a model’s adaptability in unfamiliar scenarios.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 11623
Loading