From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition

From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition

ACL ARR 2026 January Submission1283 Authors

29 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Deep Search, Context Compression, Hallucination Mitigation

Abstract: Managing long context is a bottleneck for large language models (LLMs), introducing high costs and noise. Existing compression methods often disrupt local coherence or rely on latent encodings subject to positional bias and API incompatibility. We propose the EDU-based Context Compressor, an explicit framework designed to preserve structure and detail. We reformulate compression as a structure-then-select process: First, LingoEDU parses text into an Elementary Discourse Unit (EDU) relation tree, anchored to source indices to prevent hallucination. Second, a lightweight module ranks and selects query-relevant sub-trees. To evaluate structural understanding, we introduce StructBench, a manually annotated dataset of 248 documents. Empirical results demonstrate that our method achieves SOTA structural prediction accuracy and significantly outperforms frontier LLMs while reducing costs. Furthermore, our structure-aware compression substantially enhances performance across downstream tasks ranging from long-context tasks to complex Deep Search scenarios.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: LLM Efficiency

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 1283

Loading