Improving Factual Error Correction for Abstractive Summarization via Data Distillation and Conditional-Generation Cloze

Yingqi Zhu, Yiyang Li, Lei Li, Dingxin Hu, Xueyi Hao, Dongsheng Chen, Xingyue Zhang, Zhejun Zhang, Yanquan Zhou, Marina Litvak, Natalia Vanetik

Published: 01 Jan 2025, Last Modified: 26 Jan 2026IEEE Transactions on Audio, Speech and Language ProcessingEveryoneRevisionsCC BY-SA 4.0

Abstract: Improving factual consistency in abstractive summarization has been a focus of recent research. One promising approach is the post-editing method. However, previous works have yet to make sufficient use of factual factors in summaries and suffer from the negative effect of the training datasets. In this paper, we first propose a novel factual error correction model FactCloze based on a conditional-generation cloze task. FactCloze can construct the causality among factual factors while being able to determine whether the blank can be answered. Then, we propose a data distillation method to generate a more faithful summarization dataset SummDSC via multiple-dimensional evaluation. We validate our method on both non-LLM and LLM-generated datasets. Besides BART and T5, we implement FactCloze using DeepSeek prompt. Finally, we examine the differences between LLM-based and traditional evaluation metrics for factual error correction.

External IDs:doi:10.1109/taslpro.2025.3567744