GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment

Jiwei Tang; Zhicheng Zhang; Shunlong Wu; Lichen Bai; Jingheng Ye; Zitai Wang; Tingwei Lu; Hai-Tao Zheng; Hong-Gee Kim

GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment

Jiwei Tang, Zhicheng Zhang, Shunlong Wu, Lichen Bai, Jingheng Ye, Zitai Wang, Tingwei Lu, Hai-Tao Zheng, Hong-Gee Kim

18 Sept 2025 (modified: 04 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Context Compression

TL;DR: Enhancing Context Compression via Group Merging and Layer Semantic Alignment

Abstract: Large Language Models (LLMs) have achieved impressive performance in a wide range of Natural Language Processing (NLP) tasks. However, when applied to long-context scenarios, they face two challenges, i.e., computational inefficiency and redundant information. This paper introduces GMSA, a context compression method based on the encoder-decoder architecture, addressing these challenges by reducing input sequence length and redundant information. Structurally, GMSA has two key components: Group Merging and Layer Semantic Alignment (LSA). Group merging is used to extract summary vectors evenly and efficiently from the original context. Layer semantic alignment, on the other hand, aligns the high-level abstract summary vectors with the low-level primary input semantics, thus bridging the semantic gap between different layers. In the training process, GMSA first learns soft tokens that contain nearly complete semantics via autoencoder training. To further adapt GMSA to downstream tasks, we propose Knowledge Extraction Fine-tuning (KEFT) to extract task-relevant knowledge from these soft tokens. GMSA not only significantly outperforms the traditional compression paradigm in context restoration but also achieves stable and significantly faster convergence with only a few encoder layers. We further evaluate GMSA on question-answering, summarization, and general knowledge retention capabilities across two backbones (i.e., LLaMA-2-7B and Qwen2-7B), demonstrating its effectiveness and superiority, e.g., on the NaturalQuestions dataset, GMSA can achieve approximately a 2x speedup in end-to-end inference while outperforming various methods by a large margin.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 11888

Loading