TOGA: Trigger Optimization for Clean Data Ordering Backdoor Attack

TOGA: Trigger Optimization for Clean Data Ordering Backdoor Attack

ICLR 2026 Conference Submission21816 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adversarial machine learning, backdoors, subpopulation attacks

TL;DR: We introduce TOGA, a framework for optimizing backdoor triggers in data ordering attacks, allowing more effective and targeted attacks without poisoning training data.

Abstract: Recent work has shown that backdoors can be learned in neural networks purely through the malicious reordering of clean training data, without modifying labels or inputs. These data ordering attacks rely on gradient alignment, ordering clean samples to approximate the gradients of an adversarial task. However, the effectiveness of such attacks depends greatly on the choice of the backdoor trigger, which determines how closely clean gradients align with the backdoor gradients. In this work, we introduce the first framework (TOGA - Trigger Optimization for Gradient Alignment) for optimizing triggers specifically for data ordering attacks. Our method significantly improves attack success rates by an average of 46\% over prior methods across benchmark datasets (CIFAR-10, CelebA, and ImageNet) and sensitive application domains (ISIC 2018 for dermatology and UCI Credit-g for credit scoring), without compromising clean performance. We further show that optimized triggers can be adapted to create subpopulation-specific backdoors, selectively targeting vulnerable subpopulations. Finally, we show our method is robust against purification and gradient-similarity defenses. Our findings reveal new security and fairness risks for high-stakes domains, underscoring the need for broader defenses against data ordering attacks.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 21816

Loading