Treat Bias as Noise: Training Bias-Robust LLM Reasoning via Reinforcement Learning

Published: 03 Jun 2026, Last Modified: 03 Jun 2026AI4GOOD Workshop 2026 RegularEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Cognitive Bias, Reinforcement Learning, Trustworthy AI, LLM-as-a-Judge
Abstract: Large language models (LLMs) increasingly serve as reasoners and are being considered as automated evaluators, yet they remain susceptible to cognitive biases---often altering their reasoning when faced with spurious prompt-level cues such as consensus claims or authority appeals. Existing mitigations via prompting or supervised fine-tuning fail to generalize, as they modify surface behavior without changing the optimization objective that makes bias cues attractive. We propose \textbf{Epistemic Independence Training (EIT)}, a reinforcement learning framework built around a simple principle: models should learn that bias cues are \emph{unreliable} rather than learning to either follow or reject them. EIT trains on balanced conflict examples where each injected cue is equally likely to support the correct or incorrect answer, and uses a reward that penalizes bias-following errors without rewarding agreement with a cue that happens to be correct---making the cue non-predictive of reward. On controlled MMLU-Pro reasoning tasks with bias injection, EIT improves accuracy and robustness on both Qwen3-1.7B and Qwen3-4B when bias points to wrong answers, while preserving performance when bias aligns with truth. Trained only on bandwagon bias, EIT generalizes along two out-of-domain axes: held-out MMLU-Pro subjects and unseen bias types (authority, distraction, verbosity). EIT-trained Qwen3-4B further outperforms untrained Qwen3-8B and Qwen3-14B on bias resistance, indicating that targeted training is more effective than model scaling alone. Code and data are available at \url{https://anonymous.4open.science/r/bias-mitigation-with-rl-BC47}.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 396
Loading