Auditing the Judge: Human-Grounded Bias Discovery, Quantification, and Mitigation in LLM Judges

Published: 03 Jun 2026, Last Modified: 03 Jun 2026AI4GOOD Workshop 2026 RegularEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM-as-a-Judge, Bias Identification, Bias Quantification, Bias Mitigation
Abstract: While Large Language Models (LLMs) are increasingly deployed as automated evaluators in evaluation and training pipelines, their judgements are often affected by systematic biases that conflict with human preferences. While prior work has identified several known biases and proposed methods for their detection and mitigation, they lack strong grounding in human evaluation preferences, which is essential to ensuring that the identified biases correspond to actual human judgment behavior. Moreover, they rely heavily on pre-discovered bias lists, overlook bias strength, and depend on costly interventions. In this work, we propose HUB-J, an integrated framework grounded in human evaluation preferences that detects, quantifies, and mitigates biases in LLM-as-a-Judge systems. Our approach leverages human–LLM judgement disagreement cases to automatically discover interpretable bias factors, and utilizes agreement cases to quantify bias strength through controlled input modifications and resulting shifts in model decisions. Finally, building on these quantified biases, we introduce a lightweight, training-free regression-based mitigation strategy that corrects bias-influenced judgments by removing the estimated bias effects. Empirical results show that HUB-J uncovers both known and novel bias factors, reveals meaningful differences in model susceptibility, and consistently reduces bias-driven decision flips while generalizing across models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 383
Loading