Two Are Better than One: Context Window Extension with Multi-Grained Self-Injection

Two Are Better than One: Context Window Extension with Multi-Grained Self-Injection

ACL ARR 2025 February Submission3620 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The limited context window of contemporary large language models (LLMs) hinders broader application. In this work, we present SharedLLM, a novel approach grounded in the design philosophy of multi-grained context compression and query-aware information retrieval. SharedLLM is composed of two short-context LLMs: a lower moel (compressor) and an upper model (decoder). The lower model compresses context information, while the upper model processes compressed, context information from the lower model and performs context-aware modeling. Information transfer between the compressor and decoder occurs only at the lowest layers to reduce redundant computation. Based on this architecture, we introduce a specialized tree-style data structure to efficiently encode, store and retrieve multi-grained contextual information from text chunks. This entire process, wherein the sender and receiver are derived from the same LLM layer, is referred to as self-injection. In our evaluation on long-context modeling and understanding tasks, \modelname~achieves superior or comparable results to several strong baselines, striking an effective balance between efficiency and performance. Meanwhile, with the aforementioned design choices, SharedLLM can greatly reduce memory consumption, and demonstrates substantial speed-ups over other advanced baselines ($2\times$ over streaming, 3x over encoder-decoder architectures). The core code of our implementation along with training and evaluation is available in appendix and supplementary.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: pretraining, scaling, fine-tuning

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 3620

Loading