Keywords: memory augmented pretraining
Abstract: Pretrained on trillions of tokens, LLMs are known for their ability to store a large amount of factual knowledge in their parametric memory. However, recalling facts from this memory is known to be unreliable, particularly for long-tail knowledge—obscure facts infrequently mentioned in training data. In this work, we propose a novel approach to improve the factuality of LLMs on long-tail knowledge. We begin by identifying atomic facts that are not present in a pretrained LLM's parametric memory. These facts are then stored in an external, non-parametric memory. Subsequently, the model undergoes continual pretraining, enabling it to learn when to consult this external memory at inference time. Compared with existing approaches, our approach uses a compact external memory that selectively stores only the facts not clearly present in the LLM's parametric memory, resulting in minimal additional inference-time costs in terms of both time and space. Furthermore, our method outperforms fully trained models of comparable size on knowledge-intensive benchmarks and achieves competitive results against larger models.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21151
Loading