Salient Conditional Diffusion for Backdoors

Brandon B May; Norman Joseph Tatro; Piyush Kumar; Nathan Shnidman

Salient Conditional Diffusion for Backdoors

Brandon B May, Norman Joseph Tatro, Piyush Kumar, Nathan Shnidman

Published: 04 Mar 2023, Last Modified: 27 Apr 2023ICLR 2023 BANDS SpotlightReaders: Everyone

Keywords: diffusion models, ddpms, saliency, backdoor attacks

TL;DR: We use a diffusion model (DDPM) conditioned via a saliency mask as a black-box defense against backdoor attacks.

Abstract: We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), a state-of-the-art defense against backdoor attacks. Sancdifi uses a diffusion model (DDPM) to degrade an image with noise and then recover it. Critically, we compute saliency map-based masks to condition our diffusion, allowing for stronger diffusion on the most salient pixels by the DDPM. As a result, Sancdifi is highly effective at diffusing out triggers in data poisoned by backdoor attacks. At the same time, it reliably recovers salient features when applied to clean data. Sancdifi is a black-box defense, requiring no access to the trojan network parameters.

0 Replies

Loading