Boosting Targeted Adversarial Transferability: A Generative Approach Guided by Core Target Samples

Boosting Targeted Adversarial Transferability: A Generative Approach Guided by Core Target Samples

ICLR 2026 Conference Submission15847 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Targeted Adversarial Attack, Transferable Attack, Generative AI, Deep Neural Networks

TL;DR: This paper proposes BAT, a generative method that improves targeted adversarial transferability using pruned surrogate ensembles and core target guidance.

Abstract: Adversarial examples generated on one model can often be transferred to other unseen models, but achieving high targeted transferability remains challenging due to overfitting—especially under single-surrogate constraints. In this work, we propose BAT, a generative approach that Boosts targeted Adversarial Transferability by training the generator to align its outputs with a curated set of high-confidence \textit{core target samples}. These samples—either selected from real data or synthesized from noise—serve as guidance across both output and feature spaces. To mitigate overfitting without requiring multiple surrogates, BAT employs an ensemble of frozen discriminators derived via pruning from a single pretrained surrogate model. BAT is applicable whether both the generator's training (source) and the evaluation images come from the target models’ training domain or exhibit a domain shift; it remains effective even without real target-class images during training. Extensive experiments on ImageNet-1K show that BAT notably outperforms existing $\ell_{\infty}$-constrained targeted attacks. We also provide theoretical bounds that reveal how ensemble size influences transferability, aligning with observed empirical trends.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 15847

Loading