Bypassing LLM Watermarks with Color-Aware Substitutions

Anonymous

Bypassing LLM Watermarks with Color-Aware Substitutions

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Watermarking approaches are proposed to identify if the text being circulated is human or large language model (LLM) generated. The state-of-the-art strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific “green” tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods do not incorporate color information (if a token is green/not) and may fail to evade longer text watermark detection. We propose Self Color Testing-based Substitution (SCTS), the first “color-aware” attack. SCTS gets color information by strategically prompting the watermarked LLM and comparing output frequencies, using which it can determine token colors. It then substitutes green tokens with red ones. In our experiments, SCTS successfully evades watermark detection using fewer edits than related work. Additionally, we show both theoretically and empirically that SCTS can remove the watermark for arbitrarily long watermarked text.

Paper Type: long

Research Area: Generation

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: English

Preprint Status: We plan to release a non-anonymous preprint in the next two months (i.e., during the reviewing process).

A1: yes

A1 Elaboration For Yes Or No: Section 8

A2: yes

A2 Elaboration For Yes Or No: LLM watermarks that are deployed can be circumvented. We hope our work spurs the design of more robust techniques for watermarking.

A3: yes

A3 Elaboration For Yes Or No: Abstract and Section 1

B: yes

B1: yes

B1 Elaboration For Yes Or No: Section 1, 4, 5, and 6.

B2: no

B2 Elaboration For Yes Or No: The codes and icons we use are publicly available and used only for research purposes.

B3: no

B3 Elaboration For Yes Or No: The codes and icons we use are publicly available and used only for research purposes. Our code will be released later for research purposes.

B4: no

B4 Elaboration For Yes Or No: The C4 dataset's RealNewsLike subset we use is public and widely used.

B5: n/a

B6: yes

B6 Elaboration For Yes Or No: Appendix D.1

C: yes

C1: yes

C1 Elaboration For Yes Or No: Appendix D.2

C2: yes

C2 Elaboration For Yes Or No: Section 5 and Appendix D

C3: no

C3 Elaboration For Yes Or No: Just a single run indicated by context

C4: yes

C4 Elaboration For Yes Or No: Section 5 and Appendix D

D: no

D1: n/a

D2: n/a

D3: n/a

D4: n/a

D5: n/a

E: yes

E1: no

E1 Elaboration For Yes Or No: AI assistants are only used for auxiliary and AI-generated content is carefully checked.

0 Replies

Loading