Keywords: Machine Learning, Adversarial Robustness, Certification, Adversarial Training, Uncertainty Quantification, Calibration, Deep Learning, Certified Calibration
TL;DR: We introduce certified calibration, a worst case bound on probability calibration under adversarial attacks, provide a framework for approximating it and improving model training to tighten the bound.
Abstract: Since neural classifiers are known to be sensitive to adversarial perturbations that alter their accuracy, certification methods have been developed to provide provable guarantees on the insensitivity of their predictions to such perturbations. On the other hand, in safety-critical applications, the frequentist interpretation of the confidence of a classifier (also known as model calibration) can be of utmost importance. This property can be measured via the Brier score or the expected calibration error. We show that attacks can significantly harm calibration, and thus propose certified calibration providing worst-case bounds on calibration under adversarial perturbations. Specifically, we produce analytic bounds for the Brier score and approximate bounds via the solution of a mixed-integer program on the expected calibration error. Finally, we propose novel calibration attacks and demonstrate how they can improve model calibration through adversarial calibration training. The code will be publicly released upon acceptance.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4772
Loading