Keywords: Large Language Models (LLMs), Moral Bias, Fine-Tuning, Cross-Bias Effects, Fairness in AI, Bias Mitigation
Abstract: Large language models (LLMs) exhibit remarkable capabilities across diverse natural language processing tasks, but they remain vulnerable to inheriting and amplifying biases from training data. This paper investigates the effects of fine-tuning LLMs using datasets that contain moral bias. Two key findings are reported. First, as dataset size increases, model bias follows a non-linear trajectory: initially intensifying and later diminishing. Second, injecting a single type of bias can induce changes across other bias categories. For example, fine-tuning with gender-biased data reduced the prevalence of age, regional, and racial biases. These results suggest complex interdependencies between bias categories and emphasize the importance of considering dataset scale in bias mitigation strategies.
Submission Number: 259
Loading