From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition
Keywords: Large Language Models, Deep Search, Context Compression, Hallucination Mitigation
Abstract: Managing long context is a bottleneck for large language models (LLMs), introducing high costs and noise. Existing compression methods often disrupt local coherence or rely on latent encodings subject to positional bias and API incompatibility. We propose the EDU-based Context Compressor, an explicit framework designed to preserve structure and detail. We reformulate compression as a structure-then-select process: First, LingoEDU parses text into an Elementary Discourse Unit (EDU) relation tree, anchored to source indices to prevent hallucination. Second, a lightweight module ranks and selects query-relevant sub-trees. To evaluate structural understanding, we introduce StructBench, a manually annotated dataset of 248 documents. Empirical results demonstrate that our method achieves SOTA structural prediction accuracy and significantly outperforms frontier LLMs while reducing costs. Furthermore, our structure-aware compression substantially enhances performance across downstream tasks ranging from long-context tasks to complex Deep Search scenarios.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 1283
Loading