Keywords: LLM Peer Review; Overfitting
TL;DR: LLM peer review is vulnerable to stylistic tweaks exploiting rubric cues; simple evidence-based defences improve robustness, stressing careful prompting and transparency in AI reviewing.
Abstract: Peer review by large language models (LLMs) is susceptible to "overfitting" on rubric cues. Small stylistic modifications can infuence how Al reviewers score apaper, yet simple defences might mitigate this vulnerability. We present a miniature experimental reproduction of the Review-Overfitting Challenge. Four arXiv abstracts from machine learning were assessed against a six-item rubric. We then performed an Al-style attack by rewriting the abstracts to emphasise novelty with-out altering factual content. Borderline papers flipped from borderline to accept. A rubric-anchored defence eliminated the fips, demonstrating that requiring evi-dence for each criterion improves robustness. Our study underscores the need forcareful prompting and transparency when deploying Al reviewers.
Supplementary Material: zip
Submission Number: 95
Loading