Semi-parametric language model with selective memory

Semi-parametric language model with selective memory

ICLR 2026 Conference Submission21151 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: memory augmented pretraining

Abstract: Pretrained on trillions of tokens, LLMs are known for their ability to store a large amount of factual knowledge in their parametric memory. However, recalling facts from this memory is known to be unreliable, particularly for long-tail knowledge—obscure facts infrequently mentioned in training data. In this work, we propose a novel approach to improve the factuality of LLMs on long-tail knowledge. We begin by identifying atomic facts that are not present in a pretrained LLM's parametric memory. These facts are then stored in an external, non-parametric memory. Subsequently, the model undergoes continual pretraining, enabling it to learn when to consult this external memory at inference time. Compared with existing approaches, our approach uses a compact external memory that selectively stores only the facts not clearly present in the LLM's parametric memory, resulting in minimal additional inference-time costs in terms of both time and space. Furthermore, our method outperforms fully trained models of comparable size on knowledge-intensive benchmarks and achieves competitive results against larger models.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21151

Loading