Coupling Attention and Memory: A Dynamic Memory Module for Efficient Adapation with Pretrained LLMs

Coupling Attention and Memory: A Dynamic Memory Module for Efficient Adapation with Pretrained LLMs

ICLR 2026 Conference Submission19325 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models

Abstract: Pretrained large language models (LLMs) are highly capable but still require adaptation for various domains. Existing fine-tuning strategies typically assume either access to all target task data *simultaneously* (e.g., multi-task learning), or a sequential data stream, as in continual learning, where the former tackles the simultaneous task interference issue while the latter focuses on addressing the catastrophic forgetting problem. In this work, we propose a unified approach to address both scenarios. We present DynMem, a unified framework that tackles both settings with a lightweight dynamic memory module built on top of frozen pretrained LLMs. DynMem encodes past examples into a fixed-sized memory bank. We design a novel dynamic update mechanism where new examples and existing memory entries are ranked based on their *accumulated* attention scores, and the lowest-ranked examples are thus pruned to maintain size. To further reduce recency bias, we adopt a new bi-level memory design: $\mathrm{L_1}$ Memory is actively used by the backbone LLM, while $\mathrm{L_2}$ Memory stores more diverse examples for improved effectiveness at minimal cost. The design also supports more flexible test-time scaling by allowing larger memory banks. We evaluate DynMem under both simultaneous and continual learning settings. Our method consistently outperforms state-of-the-art baselines tailored for each scenario, demonstrating its great potential in mitigating task inference for both simultaneous and sequential learning. In particular, DynMem outperforms a suite of specialized baselines in simultaneous adaptation across different models, yet achieves this with approximately 50\% fewer trainable parameters.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 19325

Loading