Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge

Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge

ACL ARR 2025 May Submission6407 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: LLM-as-a-Judge employs large language models (LLMs), such as GPT-4, to evaluate the quality of LLM-generated responses, gaining popularity for its cost-effectiveness and strong alignment with human evaluations. However, training proxy judge models using evaluation data generated by powerful teacher models introduces a critical yet previously overlooked issue: teacher preference bias, where the proxy judge model learns a biased preference for responses from the teacher model. To tackle this problem, we propose a novel setting that incorporates an additional assistant model, which is not biased toward the teacher model's responses, to complement the training data. Building on this setup, we introduce AGDe-Judge, a three-stage framework designed to debias from both the labels and feedbacks in the training data. Extensive experiments demonstrate that AGDe-Judge effectively reduces teacher preference bias while maintaining strong performance across six evaluation benchmarks. \footnote{Code is available at \url{https://anonymous.4open.science/r/AGDe-Judge-E352}.}.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: LLM, model bias/fairness evaluation, debias, proxy judge model

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 6407

Loading