Certifiably Robust Classifiers: Bridging the Gap Between Theory and Practice

ICLR 2026 Conference Submission21823 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: adversarial robustness, certified robustness, deep learning theory
Abstract: Deep learning models are vulnerable to adversarial attacks, raising important concerns for their use in safety-critical applications. Existing defense methods such as empirical defenses are effective in practice but lack theoretical guarantees, while provable defenses provide a certified robustness radius which is significantly smaller than that achieved by empirical defenses. In this work, we design robust classifiers that leverage the structure of the underlying data distribution, bridging the gap between theoretical certification and strong practical performance. First, we focus on a simple setting where the data distribution is a Gaussian mixture and provide necessary and sufficient conditions under which a robust classifier is guaranteed to exist. We also propose a provably robust classifier along with its certificate of robustness and a generalization guarantee for the learnt certified radius. Next, we generalize our approach to any complex data distribution by using an encoder network to map the input data to a mixture of Gaussians. We also provide a robust classifier with a guaranteed certificate of robustness. Experiments on benchmark datasets indicate that our method outperforms existing top baselines for certified accuracy on CIFAR-10 dataset, while achieving competitive performance on ImageNet even against computationally demanding prior methods.
Supplementary Material: pdf
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 21823
Loading