Overcoming Self-Imposed Limits: Five Words to Break an LLM's Context Compression Barrier

Lin-Wei Chao; Kuang-Da Wang; Wen-Chih Peng

Overcoming Self-Imposed Limits: Five Words to Break an LLM's Context Compression Barrier

Lin-Wei Chao, Kuang-Da Wang, Wen-Chih Peng

Published: 22 Jun 2025, Last Modified: 22 Jun 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: distillation, data-efficient training, NLP in resource-constrained settings

TL;DR: Efficient knowledge distillation prompt to generate training set for training context compressor.

Abstract: This paper focuses on efficient Large Language Model data compression. Considering the linear context growth of self-evaluating and divide-and-conquer LLM modeling methods, techniques are needed to manage the size of shared context. Existing approaches compress data through prompt tuning, using detailed instructions to guide the output. However, this method may be suboptimal: (i) defining principles may restrict an LLM's inherent ability to compress data; (ii) longer prompts increase the overhead needed to process data. To address these issues, we built upon the framework proposed by LLMLingua2, which formulates data compression as a token classification problem, and trains knowledge distilled models on data generated using compression prompts. We observed their model's output, designed new prompts targeted at areas of improvement, and evaluated on downstream tasks, such as summarization, question answering and mathematics. We then test our best prompting method on the summarization task of MeetingBank, 3% the size of LLMLingua2's prompt, while achieving a 61% size reduction of distilled data and higher model evaluation result than LLMLingua2's prompting method on all eight different metrics, at a low resource level of 1000 training pairs.

Archival Status: Non‑archival

Paper Length: Short Paper (up to 4 pages of content)

Submission Number: 352

Loading