LLM4Review: A Multi-Agent Framework for Autonomous Peer Review of AI-Written Research

Agents4Science 2025 Conference Submission310 Authors

16 Sept 2025 (modified: 06 Dec 2025)Agents4Science 2025 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-agent systems, large language models, automated peer review
Abstract: We introduce a closed-loop, multi-agent framework that assigns large language models (LLMs) to the canonical roles of Author, Reviewer, Reviser, and Meta-Reviewer, thereby emulating the end-to-end scientific publishing workflow. The system follows a round-based protocol in which an Author drafts a manuscript, independent Reviewers return rubric-based critiques and recommendations, a Reviser converts critiques into a structured change plan and a point-by-point response letter, and a Meta-Reviewer issues an accept/continue/reject decision under explicit thresholds and compute/latency budgets. Quantitatively, we aggregate reviewer scores with reliability-aware weighting and track improvements in an overall quality metric across rounds, while measuring reviewer agreement, edit magnitude, and quality–cost trade-offs. Diagnostics reveal predictable biases (order, verbosity, self-model) that are mitigated by independence, aggregation, and optional cross-review. Robustness probes demonstrate that document-borne prompt injections can shift recommendations, motivating sanitization and provenance logging that substantially reduce decision drift. The framework yields auditable artifacts at every step (manuscripts, reviews, responses, meta-decisions) and requires no external datasets, enabling reproducible evaluation of autonomous LLM science workflows. We release prompts, logs, topic bank, and analysis code to facilitate replication and future extensions.
Supplementary Material: zip
Submission Number: 310
Loading