Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models

ACL ARR 2025 February Submission128 Authors

03 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) often struggle to accurately read and comprehend extremely long texts. Current methods for improvement typically rely on splitting long contexts into fixed-length chunks. However, fixed truncation risks separating semantically relevant content, leading to ambiguity and compromising accurate understanding. To overcome this limitation, we propose a straightforward approach for dynamically separating and selecting chunks of long context, facilitating a more streamlined input for LLMs. In particular, we compute semantic similarities between adjacent sentences, using lower similarities to adaptively divide long contexts into variable-length chunks. We further train a question-aware classifier to select sensitive chunks that are critical for answering specific questions. Experimental results on both single-hop and multi-hop question-answering benchmarks indicate that the proposed approach significantly outperforms state-of-the-art baselines. More importantly, our approach demonstrates consistent robustness across varying input lengths, supporting up to 256k tokens. Our datasets and code are available at the following link: https://anonymous.4open.science/r/DCS-4C88.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: reading comprehension, multihop QA, question generation
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 128
Loading