Keywords: language model, Context compression
Abstract: To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, researchers often encounter constraints stemming from finite computational resources and bounded memory capacities. This work proposes a novel approach, termed Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs. Furthermore, we delve into the prevalent issue of degraded model performance when both instructional prompts and contextual information undergo compression for downstream tasks. To address this challenge, we propose a novel instruction reconstruction methodology aimed at mitigating the detrimental effects of this compression process. The effectiveness of our proposed approach was validated across multiple tasks while achieving an impressive context compression rate of at least 32x. On text reconstruction task, we maintain a BLEU-4 score close to 0.95. On passkey retrieval task, we achieve nearly 100% accuracy involving an extensive sequence length of 1 million tokens. On long-text question-answering task, we obtain comparable performance with the non-compressed LLM in F1 and Rouge scores. Our method also demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2386
Loading