Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

ACL ARR 2024 August Submission246 Authors

15 Aug 2024 (modified: 05 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. We find that errors are more common in complex sentences and are associated with several discourse features. We propose a framework that decomposes long texts into discourse-inspired chunks and utilizes discourse information to better aggregate sentence-level scores predicted by NLI models. Our approach shows improved performance on top of different model baselines over several evaluation benchmarks, including DiverSumm, LongSciVerify, and LongEval, focusing on long document summarization. This underscores the significance of incorporating discourse features in developing models for scoring summaries with respect to long document factual inconsistency.
Paper Type: Long
Research Area: Discourse and Pragmatics
Research Area Keywords: evaluation, factuality, long-form summarization, discourse relations
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Submission Number: 246
Loading