Generalization over Memorization in In-Context Learning

Generalization over Memorization in In-Context Learning

TMLR Paper4335 Authors

24 Feb 2025 (modified: 19 Jun 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Transformers exhibit remarkable in-context learning capabilities, solving new tasks without requiring explicit model weight updates. However, existing training paradigms for in-context learners rely on vast, unstructured datasets, which are costly and challenging to collect. These paradigms diverge significantly from how humans learn. Motivated by these limitations, we propose a paradigm shift: training on multiple smaller, domain-specific datasets to improve generalization. We investigate this paradigm by leveraging meta-learning to train an in-context learner across diverse, small-scale datasets using the Meta-Album benchmark. We further investigate realistic scenarios, including domain streaming with curriculum learning strategies and settings where training data is entirely unlabeled. Our experiments demonstrate that this multi-dataset approach promotes broader generalization, enhances robustness in streaming scenarios, and achieves competitive performance even under unsupervised conditions. These findings advance the in-context learning paradigm and shed light on how to bridge the gap between artificial and natural learning processes.

Submission Length: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Ying_Wei1

Submission Number: 4335

Loading