A Watermark for Black-Box Language Models

Dara Bahri; John Frederick Wieting

A Watermark for Black-Box Language Models

Dara Bahri, John Frederick Wieting

Published: 06 Mar 2025, Last Modified: 16 Apr 2025WMARK@ICLR2025EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 9 pages)

Keywords: watermarking, AI-text detection, black-box

Abstract: Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require \emph{white-box} access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. \emph{black-box} access), boasts a \emph{distortion-free} property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.

Presenter: ~Dara_Bahri1

Format: Yes, the presenting author will definitely attend in person because they are attending ICLR for other complementary reasons.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 40

Loading