PaperDoctor: Evidence-Grounded and Actionable Feedback for Scientific Papers in Progress

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Paper Feedback, AutoResearch, Peer Review
TL;DR: PaperDoctor: Evidence-Grounded and Actionable Feedback for Scientific Papers in Progress
Abstract: Autoresearch agents are reshaping the research pipeline, but they also let flawed claims enter the literature at scale. Human advisors catch such issues on in-progress drafts through careful, traceable feedback, yet advisor-style assessment requires extensive manual effort and does not scale. To shift automated paper assessment from a judge to a diagnostician, we introduce PaperDoctor, an agent framework for pre-submission feedback with three key innovations: (i) Holistic hierarchical pipeline. Every paper is assessed across writing, layout, references, code, theory, prior work, and experiments through three layers: L1 paper-only screening runs cheaply on every submission; L2 typed verifiers route each claim to the skill that owns its evidence; and L3 reproducers rerun experiments by priority. (ii) Evidence-grounded actionable feedback. Each finding is a triple of an observation (Why), a pointer to a specific location such as a sentence, equation, or code line (Where), and a revision suggestion (How), making every critique auditable and actionable. (iii) Effective experimental reproduction. Beyond reading the paper, PaperDoctor selectively rebuilds and reruns experiments based on claim importance and compute budget, surfacing reproducibility gaps and quantitative limitations that are invisible from the manuscript alone. We evaluate PaperDoctor on human studies which are junior students on their pre-submission papers, yield 85% accuracy, notably high in claim assumption validation. We further evaluate PaperDoctor on 60 manuscripts across machine learning, natural and social-sciences, covering human- and agent-author papers with code. Overall, PaperDoctor produces more grounded feedback than human and LLM reviewers, its findings track paper quality, and its reproduction stage surfaces limitations missed in the paper’s main body. To empower the community, we develop a demo interface that lets authors browse findings grounded on their paper. PaperDoctor reframes automated paper assessment as a diagnostic process rather than a verdict, taking a concrete step toward AI advisors that help authors raise the quality of scientific writing in the autoresearch era.
Submission Number: 19
Loading