Context Similarity Structure Shapes the Emergence of Reliable In-Context and In-Weights Mixtures

Context Similarity Structure Shapes the Emergence of Reliable In-Context and In-Weights Mixtures

ICLR 2026 Conference Submission24869 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-Context Learning, Transformers, Sequence to Sequence Learning, Continuous Adaptation, In-Weights Learning

Abstract: We aim to train models that co-develop in-context learning (ICL) and in-weights learning (IWL), and flexibly switch between them based on context relevance. Such models should exploit closely related in-context examples while relying on IWL when examples are irrelevant. Although LLMs exhibit both modes, standard task-specific fine-tuning often erodes ICL, motivating IC-Train, a form of fine-tuning with in-context examples. When trained under IC-Train, prior work has shown that emergence of ICL depends on factors such as task diversity and training duration. We show that an overlooked factor is the similarity structure between target inputs and context examples. Of the two existing modes of context-target pairing, random context leads to IWL dominance, while only similar examples in context causes ICL to degenerate to copying labels without regard to relevance. To address this, we propose Contrastive-Context which enforces two types of contrasts: (1) mix of similar and random examples within a context to evolve a correct form of ICL, and (2) varying grades of similarity across contexts to evolve IWL-ICL mixtures. With experiments on real sequence to sequence learning tasks on four models, we show that Contrastive-Context strengthens ICL while preserving IWL, outperforming random and nearest-neighbor sampling in both in-domain and out-of-domain evaluation. Theoretical analysis and diagnostic probes confirm that contrasted contexts yield stable ICL–IWL mixtures, avoiding collapse into pure ICL, IWL, or copying. Our results establish similarity structure as a key driver of reliable ICL under fine-tuning an LLM for a task.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 24869

Loading