TL;DR: Continual learning is incremental growth of a key–value associative memory built from rank-1 atoms.
Abstract: Continual learning (CL) with large pre-trained models aims to incrementally acquire knowledge without catastrophic forgetting. Existing LoRA-based Mixture-of-Experts (MoE) methods expand capacity by adding isolated new experts while freezing old ones, but still suffer from redundancy, interference, routing ambiguity, and consequent forgetting. We investigate the issues stemming from coarse-grained expert granularity. Coarse-grained experts (e.g., high-rank LoRA) encode low-specialty information, leading to expert duplication/interference and routing degradation/confusion as experts accumulate. In this work, we propose MoRAM (Mixture of Rank-1 Associative Memory). Grounded in the view that weight matrices act as linear associative memories, MoRAM achieves CL as incremental expansion of reusable atomic rank-1 experts as memory. Each rank-1 adapter acts as a fine-grained MoE expert or an associative memory unit. By viewing rank-1 experts as key-value memory pairs, we eliminate explicit MoE-LoRA routers with self-activation, where each memory atom evaluates its relevance via its intrinsic key. The inference process thus becomes a content-addressable retrieval and recall over the incrementally accumulated memory of learning snapshots. Extensive experiments on CLIP and LLMs show that MoRAM significantly outperforms state-of-the-art methods, achieving a better plasticity–stability trade-off, stronger generalization, and reduced forgetting. Project page: https://artificer-ai-lab.github.io/MoRAM.
Lay Summary: Large foundation models are powerful but essentially frozen after publishing — teaching them a new skill tends to overwrite old ones, like a student who forgets last semester's material when cramming for a new exam. Existing fixes add bulky "patches" of new knowledge, which mix many things together and confuse the model as they pile up. We propose MoRAM, which adds knowledge in the smallest possible pieces — single "memory atoms," each like one entry in a dictionary. Each atom decides for itself when it is relevant, so the model learns "little by little" without losing what it already knows.
Link To Code: https://artificer-ai-lab.github.io/MoRAM/
Primary Area: Deep Learning
Keywords: Continual Learning, Mixture-of-Experts, Parametric Memory
Originally Submitted PDF: pdf
Submission Number: 10925
Loading