Adversarial Perturbations Are Formed by Iteratively Learning Linear Combinations of the Right Singular Vectors of the Adversarial Jacobian

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Learning ordered top-K targeted attacks directly in the image space via sequential quadratic programming
Abstract: White-box targeted adversarial attacks reveal core vulnerabilities in Deep Neural Networks (DNNs), yet two key challenges persist: (i) How many target classes can be attacked simultaneously in a specified order, known as the *ordered top-$K$ attack* problem ($K \geq 1$)? (ii) How to compute the corresponding adversarial perturbations for a given benign image directly in the image space? We address both by showing that *ordered top-$K$ perturbations can be learned via iteratively optimizing linear combinations of the $\underline{ri}ght\text{ } \underline{sing}ular$ vectors of the adversarial Jacobian* (i.e., the logit-to-image Jacobian constrained by target ranking). These vectors span an orthogonal, informative subspace in the image domain. We introduce **RisingAttacK**, a novel Sequential Quadratic Programming (SQP)-based method that exploits this structure. We propose a holistic figure-of-merits (FoM) metric combining attack success rates (ASRs) and $\ell_p$-norms ($p=1,2,\infty$). Extensive experiments on ImageNet-1k across six ordered top-$K$ levels ($K=1, 5, 10, 15, 20, 25, 30$) and four models (ResNet-50, DenseNet-121, ViT-B, DEiT-B) show RisingAttacK consistently surpasses the state-of-the-art QuadAttacK.
Lay Summary: Deep neural networks (DNNs) are highly accurate but remain vulnerable to adversarial attacks—small, often imperceptible changes to input images that cause incorrect outputs. While most attacks focus on altering the top-1 prediction, many real-world systems (e.g., search engines, medical triage) rely on the entire ranked list of outputs. This raises a key question: how can we trick a DNN to produce an ordered set of incorrect predictions? We address this with **RisingAttacK**, a novel method that directly learns adversarial perturbations in image space. Using Sequential Quadratic Programming, it optimizes minimal, interpretable changes that manipulate the model’s top-K ranking. The attack leverages linear combinations of the most sensitive directions—derived from the adversarial Jacobian—to efficiently disrupt the model’s output ordering. RisingAttacK consistently outperforms prior state-of-the-art attacks across four major models and ranking depths (K = 1 to 30), achieving higher success rates and lower perturbation norms. By enabling precise manipulation of ranked outputs, our method delivers the kind of comprehensive stress tests increasingly demanded by regulators and practitioners—tests that top-1-only attacks simply cannot provide.
Link To Code: https://github.com/ivmcl/ordered-topk-attack
Primary Area: Deep Learning->Robustness
Keywords: ordered top-K adversarial attack, deep neural networks, sequential quadratic programing
Submission Number: 2289
Loading