Toxic Text Classification in Portuguese: Is LLaMA 3.1 8B All You Need?

Amanda Oliveira, Pedro H. L. Silva, Valéria de Carvalho Santos, Gladston Moreira, Vander L. S. Freitas, Eduardo José da S. Luz

Published: 2024, Last Modified: 12 Nov 2025STIL 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Resumo The recognition of toxic and hate speech on social media platforms is important due to the significant risks posed to users and the digital ecosystem. Current state-of-the-art models, such as BERTimbau, have set benchmarks for Portuguese text classification, yet challenges remain in accurately detecting toxic content. This paper investigates the effectiveness of fine-tuning a smaller, open-source decoder-only model, LLaMA 3.1 8B 4bit, for this task. We propose an iterative prompt evolution method to optimize the model’s performance. Our results demonstrate that fine-tuning significantly enhances the LLaMA model’s F1-score from 0.61 to 0.75, surpassing BERTimbau in precision and matching the performance of the GPT-4o mini. However, the approach depends on the quality of the language models used for prompt evolution, highlighting the need for further research to enhance robustness in this area.

External IDs:dblp:conf/stil/0003SSMFL24