AESS: A Simple Method to Model Long Context in Large Language Models

Anonymous

AESS: A Simple Method to Model Long Context in Large Language Models

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: As Large Language models (LLMs) gain popularity, the need to understand long texts continues to grow. Despite many models now extending the context window several times beyond the base model, the performance of these models in processing long texts varies across different tasks. Therefore, we propose Attention Entropy Sort and Selection (AESS) to address the long text problem. Our method achieves length generalization of LLM by leveraging the large model itself to retrieve the most relevant information for the task when the context window is limited. Moreover, this method is task-agnostic, and different tasks only need different prompts to achieve their retrieval. Results from the LongBench benchmark show that AESS can improve LLM performance by 9-10% compared to other retrieval methods. Furthermore, our method can also be adapted to various models and improve performance. Therefore, AESS is a promising solution for various applications that require LLMs to handle tasks with lengthy inputs effectively.

Paper Type: long

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: NLP engineering experiment

Languages Studied: English

0 Replies

Loading