Keywords: LLM-Judge, Bias, Fairness
TL;DR: LLMs used for peer review systematically favor authors with high-status affiliations, senior positions, and long publication histories.
Abstract: The adoption of large language models (LLMs) is transforming the peer review process, from assisting reviewers in writing detailed evaluations to generating entire reviews automatically. While these capabilities offer new opportunities, they also raise concerns about fairness and reliability. In this paper, we investigate bias in LLM-generated peer reviews through controlled interventions on author metadata, including affiliation, gender, seniority, and publication history. Our analysis consistently shows a strong affiliation bias favoring authors from highly ranked institutions. We also identify directional preferences linked to seniority and prior publication record, which can meaningfully shift acceptance decisions for papers near the review threshold. Gender effects are smaller but present in several models. Notably, implicit biases become more pronounced when examining token-level soft ratings, suggesting that alignment may mask but not fully eliminate underlying preferences.
Submission Number: 107
Loading