ChunkOut: Information-Sufficient Token Pruning for Efficient Prompt Compression

ACL ARR 2025 July Submission1260 Authors

29 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) increasingly rely on long prompts or retrieved contexts, driving up inference latency and cost. We observe that tokens whose forward probability is already high given the preceding context contribute little additional information, as the model has effectively encoded their content in its hidden state. Leveraging this information- sufficiency insight, we introduce CHUNKOUT, a model - agnostic algorithm that scores each token with its next - token likelihood and simply drops those above a threshold CHUNKOUT requires no extra training, incurs O(n) overhead, and can be plugged into any frozen LLM. Across QA and summarization benchmarks, it trims 50% of prompt tokens while maintaining (and occasionally improving) task accuracy, outperforming prior compression baselines by up to 5% pp. CHUNKOUT offers a principled yet lightweight path toward faster, cheaper, and greener LLM inference.
Paper Type: Short
Research Area: Generation
Research Area Keywords: Generation, Language Modeling, Context Reduction
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: No
A2 Elaboration: Our work focuses on model compression and algorithmic efficiency, and does not introduce new application scenarios or user interaction, so we did not discuss potential societal risks.
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section4
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: AppendixA
B3 Artifact Use Consistent With Intended Use: N/A
B3 Elaboration: AppendixA
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: No. All datasets used are publicly available, well-established benchmarks that do not contain personally identifying information or offensive content, as documented in their original sources.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 4 briefly documents the public datasets used in our experiments, including their domains and task types. All datasets are well-established benchmarks with extensive prior documentation in their original papers.
B6 Statistics For Data: Yes
B6 Elaboration: Section 4 provides statistics on the size and splits of all public datasets used.
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Appendix
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Appendxi
C3 Descriptive Statistics: Yes
C3 Elaboration: Section4 reports results from a single run for each experiment.
C4 Parameters For Packages: Yes
C4 Elaboration: Appendix
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: ChatGPT was used to help polish the English writing
Author Submission Checklist: yes
Submission Number: 1260
Loading