Language Models Need Sleep: Learning to Self Modify and Consolidate Memories

ICLR 2026 Conference Submission22027 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Memory, Sleep
Abstract: The past few decades have witnessed significant advances in designing machine learning algorithms–from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that requires instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ‘’Sleep'' paradigm that allows the models to continually learn, transfer their short-term fragile memories into stable long-term knowledge, and self-modify themselves with ``Dreaming'' process. In more details, sleep consists of two main stages: (1) Memory Consolidation: a parameter expansion stage with a new Reinforcement Learning (RL)-based upward distillation process, called Knowledge Seeding, where the memories of a smaller model are distilled into a \emph{larger} network to provide more capacity; (2) Dreaming, a self-improvement phase where the model uses Reinforcement Learning (RL) to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-context, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage and its contributions to improving the continual learning capability of the models.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22027
Loading