Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement

Han Wu; Guanyan Ou; Weibin Wu; Zibin Zheng

Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement

Han Wu, Guanyan Ou, Weibin Wu, Zibin Zheng

Published: 01 Jan 2024, Last Modified: 10 Jan 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Various transfer attack methods have been proposed to evaluate the robustness of deep neural networks (DNNs). Although manifesting remarkable performance in generating untargeted adversarial perturbations, existing proposals still fail to achieve high targeted transferability. In this work, we discover that the adversarial perturbations' over-fitting towards source models of mediocre generalization capability can hurt their targeted transferability. To address this issue, we focus on enhancing the source model's gener-alization capability to improve its ability to conduct trans-ferable targeted adversarial attacks. In pursuit of this goal, we propose a novel model self-enhancement method that in-corporates two major components: Sharpness-Aware Self-Distillation (SASD) and Weight Scaling (WS). Specifically, SASD distills a fine-tuned auxiliary model, which mirrors the source model's structure, into the source model while flattening the source model's loss landscape. WS obtains an approximate ensemble of numerous pruned models to per-form model augmentation, which can be conveniently syn-ergized with SASD to elevate the source model's generalization capability and thus improve the resultant targeted per-turbations' transferability. Extensive experiments corrobo-rate the effectiveness of the proposed method. Notably, under the black-box setting, our approach can outperform the state-of-the-art baselines by a significant margin of 12.2% on average in terms of the obtained targeted transferability. Code is available at https://github.com/g4alllf/SASD.

Loading