DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process

DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process

ACL ARR 2025 February Submission7607 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) are increasingly utilized in scientific research assessment, particularly in automated paper review. However, existing LLM-based review systems face significant challenges, including limited domain expertise, hallucinated reasoning, and a lack of structured evaluation. To address these limitations, we introduce DeepReview, a multi-stage framework designed to emulate expert reviewers by incorporating structured analysis, literature retrieval, and evidence-based argumentation. Using DeepReview-13K, a curated dataset with structured annotations, we train DeepReviewer-14B, which outperforms CycleReviewer-70B with fewer tokens. In its best mode, DeepReviewer-14B achieves win rates of 88.21\% and 80.20\% against GPT-o1 and DeepSeek-R1 in evaluations. Our work sets a new benchmark for LLM-based paper review, with all resources publicly available.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Large language model, paper review, deep thinking

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 7607

Loading