How Faithful Are Your Summaries? A Study of NLI-Based Verification in Portuguese

Felipe S. F. Paula, Matheus Westhelle, Maria Cecília M. Corrêa, Luciana Regina Bencke, Viviane P. Moreira

Published: 2025, Last Modified: 21 Dec 2025STIL 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Resumo Abstractive summarization systems often generate content that is not supported by the source text, making faithfulness verification a critical evaluation step. In this paper, we investigate the reliability of Natural Language Inference (NLI) methods for detecting summary faithfulness in Portuguese. Our contribution is two-fold: (i) we introduce VERISUMM, the first large-scale dataset for summary faithfulness detection in Portuguese, and (ii) we benchmark several NLI-based approaches applied to faithfulness detection. Our experiments revealed that zero-shot models exhibit low to moderate performance and that fine-tuning improves results. However, our error analysis showed that NLI models rely heavily on lexical overlap heuristics, limiting their effectiveness.

External IDs:dblp:conf/stil/PaulaWCBM25