Toward Watermarking Peer Reviews Generated by Large Language Models

Toward Watermarking Peer Reviews Generated by Large Language Models

ACL ARR 2025 February Submission7347 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract:

The integrity of the peer-review process is crucial for maintaining scientific rigor and trust in academic publishing. This process relies on domain experts critically evaluating the merits of submitted manuscripts. However, the increasing use of large language models (LLMs) in academic writing raises concerns about the authenticity and reliability of peer reviews. Previous works have focused on estimating the proportion of AI-generated peer reviews or developing AI-generated text detectors. However, existing detectors struggle against adversarial attacks and often require domain-specific retraining. To address these challenges, we propose a watermarking framework. Our Query-Aware Response Generation module selectively applies watermarking when a user uploads a research paper. The Watermark Injection method embeds subtle yet detectable signals while preserving scientific terminology. Finally, Watermark Detection to enable editor/chair to verify review authenticity. Extensive experiments on ICLR and NeurIPS peer reviews demonstrate that our method outperforms various AI text detectors under adversarial attacks. Our results highlight watermarking as a robust and scalable solution for preserving integrity in AI-assisted peer review. We make our code, dataset, and model public.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: educational applications, ethical considerations in NLP applications, Security/privacy, transparency

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 7347

Loading