UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

Chenlong Deng; Zhisong Zhang; Kelong Mao; Shuaiyi Li; Tianqing Fang; Hongming Zhang; Haitao Mi; Dong Yu; Zhicheng Dou

UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Tianqing Fang, Hongming Zhang, Haitao Mi, Dong Yu, Zhicheng Dou

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Long context compression, sparse attention

TL;DR: We introduce UniGist, a unified gist token-based long context compression method without chunk-wise training, which significantly enhances long context retention and efficiency through a hardward-aligned design.

Abstract: Large language models are increasingly capable of handling long-context inputs, but the memory overhead of KV cache remains a major bottleneck for general-purpose deployment. While many compression strategies have been explored, sequence-level compression is particularly challenging due to its tendency to lose important details. We present UniGist, a gist token-based long context compression framework that removes the need for chunk-wise training, enabling the model to learn how to compress and utilize long-range context during training. To fully exploit the sparsity, we introduce a gist shift trick that transforms the attention layout into a right-aligned block structure and develop a block-table-free sparse attention kernel based on it. UniGist further supports one-pass training and flexible chunk sizes during inference, allowing efficient and adaptive context processing. Experiments across multiple long-context tasks show that UniGist significantly improves compression quality, with especially strong performance in recalling details and long-range dependency modeling.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 21783

Loading