Poisoning and Backdooring Contrastive Learning

Nicholas Carlini; Andreas Terzis

Poisoning and Backdooring Contrastive Learning

Nicholas Carlini, Andreas Terzis

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 OralReaders: Everyone

Keywords: Contrastive Learning, Poisoning attack, Backdoor attack, CLIP

Abstract: Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of 0.0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

One-sentence Summary: We argue poisoning and backdooring attacks are a serious threat to multimodal contrastive classifiers, because they are explicitly designed to be trained on uncurated datasets from the Internet.

11 Replies

Loading