RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance

ICLR 2026 Conference Submission19776 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Certified backdoor detection, imbalanced data
Abstract: Deep neural networks are highly vulnerable to backdoor attacks, yet most existing defenses assume balanced data and overlook the pervasive class imbalance found in real-world settings—an omission that can amplify backdoor risk. This paper offers the first in-depth study of how dataset imbalance intensifies backdoor vulnerability, showing that (i) imbalance induces majority-class bias that increases susceptibility, and (ii) standard defenses degrade markedly as imbalance grows. To address this, we introduce Randomized Probability Perturbation (RPP), a certified poisoned-sample detection framework that works in a black-box setting using only model output probabilities. For any inspected input, RPP decides whether it has been backdoor-manipulated and provides provable within-domain detectability guarantees along with a probabilistic upper bound on the false positive rate. Extensive experiments on five benchmarks (MNIST, SVHN, CIFAR-10, TinyImageNet, and ImageNet-10), covering 10 backdoor attacks and 11 baseline defenses, demonstrate that RPP achieves substantially higher detection accuracy than state-of-the-art defenses, especially under class imbalance. RPP thus establishes a theoretical and practical foundation for defending against backdoor attacks in imbalanced real-world environments.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 19776
Loading