Position Paper: How Should We Responsibly Adopt LLMs in the Peer Review Process?

ACL ARR 2025 July Submission142 Authors

24 Jul 2025 (modified: 29 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This position paper presents a novel perspective on the utilization of Large Language Models (LLMs) in the artificial intelligence paper review process. We first critique the current tendency for LLMs to be primarily used for simple review text generation, arguing instead that this approach overlooks more meaningful applications of LLMs that preserve human expertise at the core of evaluation. Instead, we advocate for leveraging LLMs to support key aspects of the review process—specifically, verifying the reproducibility of experimental results, checking the correctness and relevance of citations, and assisting with ethics review flagging. For example, integrating tools based on LLM Agents for code generation from research papers has recently enabled automated assessment of the reproducibility of the paper, thereby improving the transparency and reliability of research. By reorienting LLM usage toward these targeted and assistive roles, we outline a pathway for more effective and responsible integration of LLMs into peer review, ultimately supporting both reviewer efficiency and the integrity of the scientific process.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: human-AI interaction/cooperation
Contribution Types: Position papers
Languages Studied: English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Limitations: Lines 696-716
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: No
B1 Elaboration: We ysed Gemini 2.5, a proprietary model, for a small case study in Appendix B.
B2 Discuss The License For Artifacts: No
B2 Elaboration: It is obvious that Gemini 2.5 is a commercial model.
B3 Artifact Use Consistent With Intended Use: N/A
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: N/A
B6 Statistics For Data: N/A
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: We mentioned that we used Gemini 2.5 Pro model, and we used official web interface.
C2 Experimental Setup And Hyperparameters: No
C2 Elaboration: We used official web interface.
C3 Descriptive Statistics: No
C3 Elaboration: The case study is very small-scaled, and it is obvious that it is a single run.
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: We only used AI assistants to revise expressions.
Author Submission Checklist: yes
Submission Number: 142
Loading