Quasi-Recurrent Gist Attention: Efficiently Modeling Long Context in Large Language Model

Jing Qian; Yu Yan; Yadong Lu; Yeyun Gong; Yang Liu; yelong shen

Quasi-Recurrent Gist Attention: Efficiently Modeling Long Context in Large Language Model

Jing Qian, Yu Yan, Yadong Lu, Yeyun Gong, Yang Liu, yelong shen

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Large Language Models, gist attention, long contextual information

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Transformer-based Large Language Models (LLMs) have achieved state-of-the-art on numerous Natural Language Processing tasks. However, LLMs typically come with a predetermined context window size. This limitation, combined with the quadratic complexity of self-attention, makes pretrained LLMs struggle with long sequences. In this work, we introduce a quasi-recurrent gist attention mechanism designed to effectively capture long contextual information within LLMs. The proposed approach employs quasi-recurrent context compression techniques to iteratively integrate historical context details into the gist representation. The quasi-recurrent gist attention reduces the computation complexity from $O(n^2)$ by full-attention to $O(n)$ with no change of the original Transformer model architecture, which enables seamless fine-tuning from pretrained language models such as Llama \cite{touvron2023llama} and facilitates the natural extension of the context window. Experimental results indicate that the proposed attention mechanism yields better performance to the full-attention approach on multiple public benchmarks, while reducing the latency for modeling long context significantly.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2993

Loading