The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses

Grzegorz Gluch; Berkant Turan; Sai Ganesh Nagarajan; Sebastian Pokutta

The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses

Grzegorz Gluch, Berkant Turan, Sai Ganesh Nagarajan, Sebastian Pokutta

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Watermarks, Adversarial Defenses, Transferable Attacks, Interactive Proof Systems, Cryptography, Backdooring, Game Theory, Learning Theory

TL;DR: We show that for all classification tasks, at least one of the following exists: a watermark, an adversarial defense, or a transferable attack, with the latter tied to cryptography.

Abstract: We formalize and extend existing definitions of backdoor-based watermarks and adversarial defenses as *interactive protocols* between two players. The existence of these schemes is inherently tied to the learning tasks for which they are designed. Our main result shows that for *almost every* discriminative learning task, at least one of the two — a watermark or an adversarial defense — exists. The "*almost*" refers to the fact that we also identify a third, counterintuitive but necessary option, i.e., a scheme we call a *transferable attack*. By transferable attack, we refer to an efficient algorithm computing queries that look indistinguishable from the data distribution and fool *all* efficient defenders. To this end, we prove the necessity of a transferable attack via a construction that uses a cryptographic tool called homomorphic encryption. Furthermore, we show that any task that satisfies our notion of a transferable attack implies a *cryptographic primitive*, thus requiring the underlying task to be computationally complex. These two facts imply an "*equivalence*" between the existence of transferable attacks and cryptography. Finally, we show that the class of tasks of bounded VC-dimension has an adversarial defense, and a subclass of them has a watermark.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 14223

Loading