What is Adversarial Training for Diffusion Models?

Maria Rosaria Briglia; Mujtaba Hussain Mirza; Giuseppe Lisanti; Iacopo Masi

What is Adversarial Training for Diffusion Models?

Maria Rosaria Briglia, Mujtaba Hussain Mirza, Giuseppe Lisanti, Iacopo Masi

10 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Adversarial Training for Diffusion Models makes them extremely robust to outliers, resilient to noise and corrupted data, and finally more secure

Abstract: We answer the question in the title showing that adversarial training (AT) for diffusion models (DMs) is inherently different from classifiers. Whereas for the latter it is related to *invariance* of the output given input from a fixed class, AT for DMs requires *equivariance* to make the diffusion process still land in the data distribution. For the first time, we define AT as a means to enforce smoothness in the diffusion flow to make it more resistant to outliers or corrupted datasets. Unlike prior art, ours does not require any particular assumption on the noise model and our new training scheme can be implemented on top of the diffusion noise, using additional random noise---similar to randomized smoothing---or adversarial noise---akin to adversarial training. Our method unlocks capabilities such as intrinsically handling noisy data, dealing with extreme variability such as outliers, preventing memorization, and, obviously, improving robustness and security. We rigorously evaluate our approach with proof-of-concept datasets with *known* distributions in low- and high-dimensional space, thereby taking perfect measure of errors; we further evaluate on standard benchmarks such as CIFAR-10, recovering the underlying distribution in presence of strong noise or corrupted data.

Primary Area: Deep Learning->Robustness

Keywords: diffusion models, robustness, adversarial training, denoising

Submission Number: 782

Loading