AMRM-Pure: Semantic-Preserving Adversarial Purification
TL;DR: AMRM-Pure purifies adversarial inputs by preserving patch-level semantic relationships, achieving state-of-the-art robustness across benchmarks.
Abstract: Adversarial purification is a defense technique that employs generative models to remove adversarial perturbations. Current methods often rely on powerful generators, typically diffusion models, and focus on reducing the gap between adversarial and clean samples in the feature space, while overlooking semantic correlation within a single sample.
To address this issue, we explore adversarial purification from the perspective of preserving semantic relationships among image patches.
We employ an Attentive Mask Reconstruction Model (AMRM), which shows superior performance. Our theoretical and experimental analysis reveals that AMRM is highly sensitive to adversarial noise, as such noise significantly distorts patch relationships. Based on this observation, we propose AMRM-Pure, a purification framework that denoises adversarial inputs by preserving patch-level semantics, and formulate this process as a tractable optimization problem with respect to the input. To further enhance robustness, we finetune AMRM-Pure with classification loss to strengthen semantic consistency. We apply our insight to two AMRM architectures, including Mask Autoencoder (MAE) and MaskDiT. Extensive experiments confirm the effectiveness of our method, establishing new state-of-the-art performance across multiple benchmarks.
Submission Number: 2286
Loading