everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of achieving high long-context performance, low computational complexity, and compatibility with pretrained models -- collectively termed the ``impossible triangle''. We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into soft prompts via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. To further enhance the LLM's understanding and reasoning capabilities regarding the soft prompts, we implement two training objectives: one focused on reconstructing the encoder output and the other on long-context instruction fine-tuning. Extensive experiments including Needle in a Haystack and LongBench reveal that E2LLM not only outperforms seven existing state-of-the-art (SOTA) methods across various long-context tasks, but also achieves the lowest inference time and memory usage. Code will be available upon publication.