Evaluating the Impact of Reviewer Guideline Design on LLM-Based Automated Peer Review

ACL ARR 2026 January Submission9217 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Peer Review, LLM, Guidelines, Prompt Engineering, Rubric Design
Abstract: Peer review is an essential process in scientific research, yet the growing workload has made its automation increasingly necessary. In this study, we analyze how different types of reviewer guidelines, such as official conference guidelines and reviewer-imitating ones distilled from high-quality human reviews, affect automated peer review. Our experiments show that official conference guidelines produce review results most consistent with human judgments, suggesting that evaluation criteria refined through conference practice serve as effective guidance for automated reviewing as well. In contrast, reviewer-imitating guidelines, especially those enforcing strict rubric-style scoring, consistently degraded automated review performance, highlighting the importance of allowing subjective and holistic scoring.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: automatic creation and evaluation of language resources
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 9217
Loading