ReviewEval: An Evaluation Framework for AI-Generated Reviews

ACL ARR 2025 May Submission1520 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The escalating volume of academic research, coupled with a shortage of qualified reviewers, necessitates innovative approaches to peer review. In this work, we propose: (1) ReviewEval, a comprehensive evaluation framework for AI‐generated reviews that measures alignment with human assessments, verifies factual accuracy, assesses analytical depth, identifies degree of constructiveness and adherence to reviewer guidelines; and (2) ReviewAgent, an LLM‐based review generation agent featuring a novel alignment mechanism to tailor feedback to target conferences and journals, along with a self‐refinement loop that iteratively optimizes its intermediate outputs and an external improvement loop using ReviewEval to improve upon the final reviews. ReviewAgent improves actionable insights by 6.78\% and 47.62\% over existing AI baselines and expert reviews respectively. Further, it boosts analytical depth by 3.97\% and 12.73\%, enhances adherence to guidelines by 10.11\% and 47.26\% respectively. This paper establishes essential metrics for AI‐based peer review and substantially enhances the reliability and impact of AI‐generated reviews in academic research.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Large Language Models, AI-generated reviews, Evaluation Framework
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 1520
Loading