Keywords: Personalized Debiasing, Dynamic Intervention, Large Language Models, Bias-Utility Trade-off
TL;DR: We introduce PersonBias, a plug-and-play module that detects and mitigates social biases in LLM outputs by dynamically adapting to individual user preferences, balancing fairness with response quality.
Abstract: Social bias in large language models (LLMs) outputs has emerged as a Social bias in large language model (LLM) outputs has emerged as a critical challenge in artificial intelligence. While existing bias detection methods pursue comprehensive identification and elimination of implicit biases, this \textit{one-size-fits-all} approach presents significant limitations. Excessive bias correction causes responses to deviate from user query intent, comprehensive detection demands extensive human annotation and computational resources, and critically, user heterogeneity dictates that different individuals with diverse backgrounds and personality traits exhibit varying sensitivities toward different bias types. To address these challenges, we propose PersonBias, a lightweight, personalized debiasing framework that balances bias mitigation with response quality optimization. Our approach leverages LLMs to automatically extract user personality features from conversational contexts, eliminating the need for explicit demographic data collection. We develop a dual-tower encoder architecture with cross-attention mechanisms to model user-specific bias sensitivities, employing parameter-efficient fine-tuning that freezes encoder parameters while optimizing only projection layers and attention mechanisms. Rather than requiring model-specific fine-tuning, PersonBias operates through real-time intervention during generation, dynamically evaluating and adjusting outputs at fixed token intervals to prevent bias accumulation while maintaining relevance and utility. Experiments on multi-turn dialogue datasets demonstrate that PersonBias achieves superior bias reduction and utility preservation compared to prompt-based and fine-tuning baselines, offering a practical and adaptive solution for personalized fairness in LLMs.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 8599
Loading