CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

ACL ARR 2024 June Submission1700 Authors

14 Jun 2024 (modified: 09 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed *evicted*) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexity performance, often drop information that is important for solving downstream tasks, a problem which we call *information neglect*. To address this issue, we introduce **C**hunked **I**ns**tru**ction-aware **S**tate Eviction (**CItruS**), a novel modeling technique that integrates the attention preferences useful for a downstream task into the eviction process of hidden states. In addition, we design a method for chunked sequence processing to further improve efficiency. Our training-free method exhibits superior performance on long sequence comprehension and retrieval tasks over several strong baselines under the same memory budget, while preserving language modeling perplexity.
Paper Type: Long
Research Area: Generation
Research Area Keywords: efficient models; few-shot generation; text-to-text generation; inference methods;
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 1700
Loading