Keywords: adversarial machine learning, backdoors, subpopulation attacks
TL;DR: We introduce TOGA, a framework for optimizing backdoor triggers in data ordering attacks, allowing more effective and targeted attacks without poisoning training data.
Abstract: Recent work has shown that backdoors can be learned in neural networks purely through the malicious reordering of clean training data, without modifying labels or inputs. These data ordering attacks rely on gradient alignment, ordering clean samples to approximate the gradients of an adversarial task.
However, the effectiveness of such attacks depends greatly on the choice of the backdoor trigger, which determines how closely clean gradients align with the backdoor gradients. In this work, we introduce the first framework (TOGA - Trigger Optimization for Gradient Alignment) for optimizing triggers specifically for data ordering attacks. Our method significantly improves attack success rates by an average of 46\% over prior methods across benchmark datasets (CIFAR-10, CelebA, and ImageNet) and sensitive application domains (ISIC 2018 for dermatology and UCI Credit-g for credit scoring), without compromising clean performance. We further show that optimized triggers can be adapted to create subpopulation-specific backdoors, selectively targeting vulnerable subpopulations. Finally, we show our method is robust against purification and gradient-similarity defenses. Our findings reveal new security and fairness risks for high-stakes domains, underscoring the need for broader defenses against data ordering attacks.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 21816
Loading