Analysis of Bias in GPT Language Models through Fine-tuning Containing Divergent Data

Leandro Furlam Turi; Athus Cavalini; Giovanni Comarela; Thiago Oliveira-Santos; Claudine Badue; Alberto F. De Souza

Analysis of Bias in GPT Language Models through Fine-tuning Containing Divergent Data

Leandro Furlam Turi, Athus Cavalini, Giovanni Comarela, Thiago Oliveira-Santos, Claudine Badue, Alberto F. De Souza

Published: 01 Jan 2024, Last Modified: 06 Jun 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this study, we examined the effects of integrating data that contains divergent information, especially concerning anti-vaccination narratives, into the training of a GPT-2 language model. The model was fine-tuned using content sourced from anti-vaccination groups and channels on Telegram, aiming to analyze its ability to generate coherent and rationalized texts in comparison to a model pre-trained on OpenAI’s WebText dataset. The results demonstrate that fine-tuning a GPT-2 model with biased data leads the model to perpetuate these biases in its responses, albeit with a certain degree of rationalization. This finding underscores the importance of using high-quality and reliable data in training natural language processing models, highlighting the implications for information dissemination through these models. It also provides social scientists with a tool to explore and understand the complexities and challenges associated with public health misinformation via the use of language models, particularly in the context of vaccines.

Loading