Contrastive Counterfactual Generation for Imperceptible Adversarial Attack

Contrastive Counterfactual Generation for Imperceptible Adversarial Attack

05 Apr 2026 (modified: 04 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Imperceptible adversarial attacks aim to mislead deep neural networks by adding signal-domain perturbations that induce misclassification while remaining visually indistinguishable from the original signal. Existing methods rely on untargeted loss maximization, producing perturbations poorly aligned with decision boundaries and providing limited control over locality and perceptual cost. To address these limitations, we propose $\textbf{Contrastive Counterfactual Generation}$ ($\texttt{CoCoGen}$), a cross-domain adversarial attack framework that formulates perturbation synthesis as a constrained optimisation problem. \texttt{CoCoGen} explicitly targets the nearest decision boundary by minimising the $\textit{contrastive counterfactual margin}$ under a strict signal-energy budget. Perturbations are localised via gradient-based Top-$k$ spatial projection and confined to the high-frequency subspace using a Fourier-domain projection operator, leveraging reduced human sensitivity to high spatial frequencies. The objective is optimised using masked gradient descent with momentum, while an adaptive sparsity grid search identifies minimal feasible signal support. Experiments across multiple architectures show that $\texttt{CoCoGen}$ achieves $100\%$ Attack Success Rate (vs. $80-100\%$ for prior methods, with most below $99\%$) while maintaining a MUSIQ score of $61-63$ (vs. $36-55$), outperforming prior methods in both attack efficacy and visual quality.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Vidya_Muthukumar3

Submission Number: 8264

Loading