Fine-Tuning Large Models with Moral Bias Datasets: An Empirical Study

Fine-Tuning Large Models with Moral Bias Datasets: An Empirical Study

16 Sept 2025 (modified: 06 Dec 2025)Agents4Science 2025 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Moral Bias, Fine-Tuning, Cross-Bias Effects, Fairness in AI, Bias Mitigation

Abstract: Large language models (LLMs) exhibit remarkable capabilities across diverse natural language processing tasks, but they remain vulnerable to inheriting and amplifying biases from training data. This paper investigates the effects of fine-tuning LLMs using datasets that contain moral bias. Two key findings are reported. First, as dataset size increases, model bias follows a non-linear trajectory: initially intensifying and later diminishing. Second, injecting a single type of bias can induce changes across other bias categories. For example, fine-tuning with gender-biased data reduced the prevalence of age, regional, and racial biases. These results suggest complex interdependencies between bias categories and emphasize the importance of considering dataset scale in bias mitigation strategies.

Submission Number: 259

Loading