Learning Universal Adversarial Perturbations for Ordered Top-K Targeted Attacks

ICLR 2026 Conference Submission19052 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Attack, Universal Adversarial Perturbation, Ordered Top-K Attack, Quadratic Programming
TL;DR: a large-scale analyses of more than 500 ordered top-K targeted universal adversarial perturbations learned for 27 DNNs
Abstract: Universal adversarial perturbations (UAPs) have deepen concerns regarding the vulnerability of Deep Neural Networks (DNNs) under the white-box attack setting. While most success with UAPs has been observed in untargeted attack settings, achieving effective top-1 targeted UAPs has proven challenging. In this paper, we address this challenge by demonstrating that ordered Top-K targeted UAPs can be learned aggressively along the label target axis (tested up to Top-6), and transfer very well along the data axis (i.e., across images from the seen training images to the unseen test images). They also show strong double-transferability across unseen test models and unseen test images, when learned from an ensemble of disparate train models. Our method, named **AllAttacK**, simultaneously targets three axes: images, models, and label targets, and is posed as a maximum satisfiability (MAXSAT) problem. We evaluate AllAttacK on the ImageNet-1k classification task using 27 diverse models with more than 500 UAPs learned, showing that the resulting perturbations not only exhibit strong transferability but also display intriguing, interpretable characteristics.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 19052
Loading