Keywords: Adversarial robustness, Pattern generation, Robust classifiers, Implicit denoising
TL;DR: We show that adversarially robust classifiers contain hidden denoising capabilities accessible through their Jacobian structure, and introduce PGDD to leverage this for in-house adversarial purification without external generative models.
Abstract: Adversarially robust neural networks, while designed for classification, exhibit surprising generative capabilities when appropriately probed. We provide a theoretical framework explaining this phenomenon by connecting adversarial robustness to implicit denoising structure. Building on established results that robust training drives Jacobians toward low-rank solutions, we demonstrate that the Gram operator $\mathbf{J}^{\top}\mathbf{J}$ functions as an implicit denoiser, selectively preserving signal along discriminative subspaces while suppressing noise in orthogonal directions. This insight leads to Prior-Guided Drift Diffusion (PGDD), a simple algorithm that leverages this structure for generation through inference objectives rather than explicit Jacobian computation. PGDD requires no generative training or architectural modifications, yet produces class-consistent samples across different datasets and architectures. We extend our approach to standard networks via sPGDD, demonstrating that implicit generative structure exists beyond adversarially trained models. Our results establish a connection between discriminative robustness and generative modeling, showing that robust classifiers encode statistical priors that enable structured pattern generation without explicit generative objectives.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 21081
Loading