Windtalkers: Watermarking Open-Source LLMs with Ciphered-Instruction

Yichuan Ma; Peiji Li; Yongkang Chen; Xingcheng Zhang; Hang Yan; Qipeng Guo; Dahua Lin; Linyang Li

Windtalkers: Watermarking Open-Source LLMs with Ciphered-Instruction

Yichuan Ma, Peiji Li, Yongkang Chen, Xingcheng Zhang, Hang Yan, Qipeng Guo, Dahua Lin, Linyang Li

20 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, watermark

Abstract: With the release of powerful open-source large language models (LLMs), post-training on such models for applications is becoming increasingly prevalent. To enable ownership claims and track the potential misuse of these models after post-training, planting detectable watermarks has become an essential task. In the context of open-source models, users have complete white-box access, allowing them to freely alter the model's outputs. Consequently, some watermarking techniques, such as generation time watermarks, are ineffective. Therefore, we propose WindTalkers, a watermarking technique that is planted into the model's weights and remains robust against common post-training techniques such as reinforcement learning (RL) and supervised fine-tuning (SFT). We employ a specific cipher-like encoding method to process the instructions within the training dataset. This encoding is designed to be only recognizable by the watermarked model, thereby enabling a clear distinction between models that have been watermarked and those that have not. Experimental results demonstrate that our method does not compromise the model's general performance and maintains its robustness through various post-training procedures.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 24361

Loading