Automated Fact-Checking in Brazilian Portuguese: Resources and Baselines

Published: 2025, Last Modified: 12 Nov 2025STIL 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Resumo The spread of misinformation presents a growing societal challenge, particularly in low-resource languages such as Brazilian Portuguese (PTBR), where the scarcity of high-quality datasets limits automated fact-checking tools. In this work, we introduce translated PTBR versions of two influential English-language fact-checking datasets: LIAR and AVERITEC. These resources support multi-class veracity classification and incorporate evidence-based reasoning. We also establish baseline results for both datasets using a range of model configurations, including zero-shot and few-shot prompting with Gemma 3, and fine-tuning of encoder-based models such as mBERT, BERT-Large, and BERTimbau-Large. Across both datasets, fine-tuned encoder-based models consistently outperformed Gemma 3 in zero-shot and few-shot settings. Our results underscore the importance of task-specific fine-tuning and evidence inclusion for veracity classification in PTBR. All datasets, translation scripts, and evaluation protocols are publicly released to support further research in this area.
Loading