Measuring Factual Consistency of Abstractive Summaries

Anonymous

Measuring Factual Consistency of Abstractive Summaries

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone

Abstract: Recent abstractive summarization systems fail to generate factual consistent -- faithful -- summaries, which heavily limits their practical application. Commonly, these models tend to mix concepts from the source or hallucinate new content, completely ignoring the source.Addressing the faithfulness problem is perhaps the most critical challenge for current abstractive summarization systems.First automatic faithfulness metrics were proposed, but we argue that existing methods do not yet utilize all "machinery" that this field has to offer and introduce new approaches to assess factual correctness.We evaluate existing and our proposed methods by correlating them with human judgements and find that BERTScore works well.Next, we conduct a data analysis, which reveals common problems, ways to further improve the metrics and indicates that combining multiple metrics is promising. Finally, we exploit faithfulness metrics in pre- and post-processing steps to decrease factual errors made by state-of-the-art summarization systems.We find that simple techniques like filtering training data and re-ranking generated summaries can increase the faithfulness by a substantial margin.

0 Replies

Loading