Automated Fact-Checking in Brazilian Portuguese: Resources and Baselines

Marcelo Mussi Delucis, Lucas Fraga, Otávio Parraga, Christian Mattjie de Oliveira, Rafaela Cappelari Ravazio, Rodrigo C. Barros, Lucas S. Kupssinskü

Published: 2025, Last Modified: 12 Nov 2025STIL 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Resumo The spread of misinformation presents a growing societal challenge, particularly in low-resource languages such as Brazilian Portuguese (PTBR), where the scarcity of high-quality datasets limits automated fact-checking tools. In this work, we introduce translated PTBR versions of two influential English-language fact-checking datasets: LIAR and AVERITEC. These resources support multi-class veracity classification and incorporate evidence-based reasoning. We also establish baseline results for both datasets using a range of model configurations, including zero-shot and few-shot prompting with Gemma 3, and fine-tuning of encoder-based models such as mBERT, BERT-Large, and BERTimbau-Large. Across both datasets, fine-tuned encoder-based models consistently outperformed Gemma 3 in zero-shot and few-shot settings. Our results underscore the importance of task-specific fine-tuning and evidence inclusion for veracity classification in PTBR. All datasets, translation scripts, and evaluation protocols are publicly released to support further research in this area.

External IDs:dblp:conf/stil/DelucisFPORBK25