Evaluating the Impact of Reviewer Guideline Design on LLM-Based Automated Peer Review

Evaluating the Impact of Reviewer Guideline Design on LLM-Based Automated Peer Review

ACL ARR 2026 January Submission9217 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Peer Review, LLM, Guidelines, Prompt Engineering, Rubric Design

Abstract: Peer review is an essential process in scientific research, yet the growing workload has made its automation increasingly necessary. In this study, we analyze how different types of reviewer guidelines, such as official conference guidelines and reviewer-imitating ones distilled from high-quality human reviews, affect automated peer review. Our experiments show that official conference guidelines produce review results most consistent with human judgments, suggesting that evaluation criteria refined through conference practice serve as effective guidance for automated reviewing as well. In contrast, reviewer-imitating guidelines, especially those enforcing strict rubric-style scoring, consistently degraded automated review performance, highlighting the importance of allowing subjective and holistic scoring.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: automatic creation and evaluation of language resources

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 9217

Loading