TARMER: A Task-Aware Recursive Prompt Compression Approach with Memory Buffer

ACL ARR 2025 May Submission1926 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: As large language models (LLMs) scale to longer contexts, the quadratic cost of attention poses computational and memory challenges. Segmenting inputs mitigates this but breaks inter-segment dependencies. We propose TARMER, a \underline{t}ask-\underline{a}ware \underline{r}ecursive prompt compression method with a memory buff\underline{e}r. It jointly models compression and generation as a single forward pass, avoiding intermediate short prompts. Guided by task descriptions and queries, TARMER enhances semantic understanding and task-specific redundancy reduction. Experiments on dialogue, multiple-choice, and out-of-distribution tasks show that TARMER achieves up to 16$\times$ compression with minimal performance drop, using a memory buffer with constant space complexity.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: LLM Efficiency; NLP in resource-constrained settings;
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Submission Number: 1926
Loading