CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

Zexue He; Leonid Karlinsky; Donghyun Kim; Julian McAuley; Dmitry Krotov; Rogerio Feris

CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

Zexue He, Leonid Karlinsky, Donghyun Kim, Julian McAuley, Dmitry Krotov, Rogerio Feris

Published: 18 Jun 2024, Last Modified: 16 Jul 2024LCFM 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Associative Memory, Large Language Models, Training-Free Algorithm

TL;DR: We propose CAMELoT, a Consolidated Associative Memory Enhanced Long Transformer, which shows strong long-text modeling performance with tiny input windows without any training.

Abstract: Large Language Models (LLMs) struggle with long input sequences due to high memory and runtime costs. Memory-augmented models offer a promising solution to this problem, but existing methods have limited memory capacity and require costly re-training to integrate with the LLM. In this work, we introduce CAMELoT, a **C**onsolidated **A**ssociative **M**emory **E**nhanced **Lo**ng **T**ransformer, which has an associative memory (AM) module integrated with any pre-trained attention-based LLM. The AM module in CAMELoT consolidates token representations into a non-parametric distribution model, balancing novelty and recency, therefore giving the LLM the capability to process the long input sequences without any re-training. By retrieving information from AM, CAMELoT achieves a significant perplexity reduction in long-context modeling benchmarks, e.g.,~29.7\% on Arxiv, even with a tiny context window of 128 tokens.

Submission Number: 21

Loading