Mitigating Gender Bias in Code Large Language Models via Multi-Scales Model Editing

ACL ARR 2025 May Submission4871 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: With the innovation of model architecture and the establishment of high-quality code datasets, code large language models (LLMs) have developed rapidly. However, since most training samples are unfiltered, code LLMs are influenced by toxic samples inevitably, thereby exhibiting social biases, with gender bias in relation to profession being the most common. It is conceivable that services built on these codes will also contain gender bias in relation to profession, and ultimately threaten the security and fairness of the services for people working in different professions. There is no previous work that specifically explores gender bias in relation to profession in code LLMs. To fill this gap, we propose a dataset named GenBiasPro-CG ($\textbf{Gen}$der $\textbf{Bias}$ in relation to $\textbf{Pro}$fession in $\textbf{C}$ode $\textbf{G}$eneration). In addition to this dataset, we also propose an evaluation metric named FBS ($\textbf{F}$actual $\textbf{B}$ias $\textbf{S}$core), which measures the degree of gender bias in relation to profession in code LLMs by analyzing the gap between the outputs of code LLMs and the real world. In mitigating gender bias in generative models, model editing is considered a promising technique. However, existing model editing methods for debiasing face a variety of issues. Therefore, we develop a new model editing method named MSME ($\textbf{M}$ulti-$\textbf{S}$cales $\textbf{M}$odel $\textbf{E}$diting), which can be categorized based on the scale of adjusting model parameters into: layer, module, row, and neuron scales. Especially at the neuron scale, we can fine-tune a minimal number of parameters in the model to achieve a good debiasing effect.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Large language models, gender bias, model editing
Languages Studied: English
Submission Number: 4871
Loading