BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

Fengqing Jiang; Yichen Feng; Radha Poovendran

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

Fengqing Jiang, Yichen Feng, Radha Poovendran

Published: 08 Oct 2025, Last Modified: 15 Oct 2025Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM; Research Agent; Review Agent; Integrity; Mitigation; Evaluation

Abstract: The rapid advancement of Large Language Models (LLMs) as both research assistants and peer reviewers creates a critical vulnerability: the potential for fully automated AI-only publication loops where AI-generated research is evaluated by AI reviewers. We investigate this adversarial dynamic by introducing \textbf{BadScientist}, an experimental framework that pits a fabrication-oriented paper generation agent against multi-model LLM review systems. Our generator employs five presentation-manipulation strategies without conducting real experiments: exaggerating performance gains (\textit{TooGoodGains}), cherry-picking comparisons (\textit{BaselineSelect}), creating statistical facades (\textit{StatTheater}), polishing presentation (\textit{CoherencePolish}), and hiding proof oversights (\textit{ProofGap}). We evaluate fabricated papers using LLM reviewers calibrated on ICLR 2025 conference submission data. Our results reveal alarming vulnerabilities: fabricated papers achieve high acceptance rates across strategies, with \textit{TooGoodGains} reaching $67.0\%/82.0\%$ acceptance under different thresholds, and combined strategies achieving $52.0\%/69.0\%$. Even when LLM reviewers flag integrity concerns, they frequently assign acceptance-level scores---a phenomenon we term \textit{concern-acceptance conflict}. Our mitigation strategies, Review-with-Detection (ReD) and Detection-Only (DetOnly), show limited improvements, highlighting the inadequacy of current methods. These findings expose concrete failure modes in AI-driven review systems and demonstrate that presentation manipulation can effectively deceive state-of-the-art LLM reviewers. Our work underscores the urgent need for stronger, integrity-focused review pipelines as AI agents become more prevalent in scientific publishing.

Supplementary Material: zip

Submission Number: 284

Loading