Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Shixin Li; Zewei Li; Xiaojing Ma; Xiaofan Bai; Pingyi Hu; Dongmei Zhang; Bin Benjamin Zhu

Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Shixin Li, Zewei Li, Xiaojing Ma, Xiaofan Bai, Pingyi Hu, Dongmei Zhang, Bin Benjamin Zhu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Black-box adversarial attack; Transferability

Abstract: Adversarial examples crafted on one model often exhibit poor transferability to others, hindering their effectiveness in black-box settings. This limitation arises from two key factors: (i) \emph{decision-boundary variation} across models and (ii) \emph{representation drift} in feature space. We address these challenges through a new perspective that frames transferability for \emph{untargeted attacks} as a \emph{consensus-robust optimization} problem: adversarial perturbations should remain effective across a neighborhood of plausible target models. To model this uncertainty, we introduce two complementary perturbation channels: a \emph{parameter channel}, capturing boundary shifts via weight perturbations, and a \emph{representation channel}, addressing feature drift via stochastic blending of clean and adversarial activations. We then propose \emph{CORTA} (COnsensus--Robust Transfer Attack), a lightweight attack instantiated from this robust formulation using two first-order strategies: (i) sensitivity regularization based on the squared Frobenius norm of logits’ Jacobian with respect to weights, and (ii) Monte Carlo sampling for blended feature representations. Our theoretical analysis provides a certified lower bound linking these approximations to the robust objective. Extensive experiments on CIFAR-100 and ImageNet show that CORTA significantly outperforms state-of-the-art transfer-based methods---including ensemble approaches---across CNN and Vision Transformer targets. Notably, CORTA achieves a \emph{19.1 percentage-point gain in transfer success rate over the best prior method} while using only a single surrogate model.

Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)

Submission Number: 26763

Loading