Little By Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

Haodong Lu; Chongyang Zhao; Jason Xue; Lina Yao; Kristen Moore; Dong Gong

Little By Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Continual learning is incremental growth of a key–value associative memory built from rank-1 atoms.

Abstract: Continual learning (CL) with large pre-trained models aims to incrementally acquire knowledge without catastrophic forgetting. Existing LoRA-based Mixture-of-Experts (MoE) methods expand capacity by adding isolated new experts while freezing old ones, but still suffer from redundancy, interference, routing ambiguity, and consequent forgetting. We investigate the issues stemming from coarse-grained expert granularity. Coarse-grained experts (e.g., high-rank LoRA) encode low-specialty information, leading to expert duplication/interference and routing degradation/confusion as experts accumulate. In this work, we propose MoRAM (Mixture of Rank-1 Associative Memory). Grounded in the view that weight matrices act as linear associative memories, MoRAM achieves CL as incremental expansion of reusable atomic rank-1 experts as memory. Each rank-1 adapter acts as a fine-grained MoE expert or an associative memory unit. By viewing rank-1 experts as key-value memory pairs, we eliminate explicit MoE-LoRA routers with self-activation, where each memory atom evaluates its relevance via its intrinsic key. The inference process thus becomes a content-addressable retrieval and recall over the incrementally accumulated memory of learning snapshots. Extensive experiments on CLIP and LLMs show that MoRAM significantly outperforms state-of-the-art methods, achieving a better plasticity–stability trade-off, stronger generalization, and reduced forgetting. Project page: https://artificer-ai-lab.github.io/MoRAM.

Lay Summary: Large foundation models are powerful but essentially frozen after publishing — teaching them a new skill tends to overwrite old ones, like a student who forgets last semester's material when cramming for a new exam. Existing fixes add bulky "patches" of new knowledge, which mix many things together and confuse the model as they pile up. We propose MoRAM, which adds knowledge in the smallest possible pieces — single "memory atoms," each like one entry in a dictionary. Each atom decides for itself when it is relevant, so the model learns "little by little" without losing what it already knows.

Link To Code: https://artificer-ai-lab.github.io/MoRAM/

Primary Area: Deep Learning

Keywords: Continual Learning, Mixture-of-Experts, Parametric Memory

Originally Submitted PDF: pdf

Submission Number: 10925

Loading