Keywords: machine learning security, llm watermarking
TL;DR: We propose SEEK, a novel watermarking method for large language models that simultaneously improves robustness against both scrubbing and spoofing attacks, achieving a Pareto-optimal balance superior to existing approaches.
Abstract: Watermarking is widely regarded as a promising defense against the misuse of large language models (LLMs); however, existing methods are fundamentally constrained by their vulnerability to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work expands the trade-off boundary by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a watermark scheme with **S**ub-vocabulary decomposed **E**quivalent t**E**xture **K**ey (**SEEK**). SEEK achieves a Pareto improvement, enhancing robustness to scrubbing attacks without sacrificing resistance to spoofing.
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 770
Loading