The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses

Grzegorz Gluch; Berkant Turan; Sai Ganesh Nagarajan; Sebastian Pokutta

The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses

Grzegorz Gluch, Berkant Turan, Sai Ganesh Nagarajan, Sebastian Pokutta

Published: 18 Sept 2025, Last Modified: 10 Dec 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interactive Proof Systems, Cryptography, Backdoors, Game Theory, Learning Theory, Transferable Attacks, Adversarial Robustness

Abstract: We formalize and analyze the trade-off between backdoor-based watermarks and adversarial defenses, framing it as an interactive protocol between a verifier and a prover. While previous works have primarily focused on this trade-off, our analysis extends it by identifying transferable attacks as a third, counterintuitive but necessary option. Our main result shows that for all learning tasks, at least one of the three exists: a _watermark_, an _adversarial defense_, or a _transferable attack_. By transferable attack, we refer to an efficient algorithm that generates queries indistinguishable from the data distribution and capable of fooling _all_ efficient defenders. Using cryptographic techniques, specifically fully homomorphic encryption, we construct a transferable attack and prove its necessity in this trade-off. Finally, we show that tasks of bounded VC-dimension allow adversarial defenses against all attackers, while a subclass allows watermarks secure against fast adversaries.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 29164

Loading