The Role of Context in Sequential Sentence Classification for Long Documents

ACL ARR 2025 February Submission5864 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sequential sentence classification extends traditional classification by incorporating broader context. However, state-of-the-art approaches face two major challenges in long documents: pretrained language models struggle with input-length constraints, while proposed hierarchical models often introduce irrelevant content. To address these limitations, we propose a document-level retrieval approach that extracts only the most relevant context. Specifically, we introduce two heuristic strategies: {\bf Sequential}, which captures local information, and {\bf Selective}, which retrieves the most semantically similar sentences. Experiments on legal domain datasets show that both heuristics improve performance. Sequential heuristics outperform hierarchical models on two out of three datasets, demonstrating the benefits of targeted context.
Paper Type: Short
Research Area: Information Extraction
Research Area Keywords: open information extraction, retrieval, legal NLP, fine-tuning
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: english
Submission Number: 5864
Loading