Abstract: Resumo Abstractive summarization systems often generate content that is not supported by the source text, making faithfulness verification a critical evaluation step. In this paper, we investigate the reliability of Natural Language Inference (NLI) methods for detecting summary faithfulness in Portuguese. Our contribution is two-fold: (i) we introduce VERISUMM, the first large-scale dataset for summary faithfulness detection in Portuguese, and (ii) we benchmark several NLI-based approaches applied to faithfulness detection. Our experiments revealed that zero-shot models exhibit low to moderate performance and that fine-tuning improves results. However, our error analysis showed that NLI models rely heavily on lexical overlap heuristics, limiting their effectiveness.
External IDs:dblp:conf/stil/PaulaWCBM25
Loading