Adversarial Training with Rectified RejectionDownload PDF

29 Sept 2021 (modified: 22 Oct 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Adversarial Training, Rectified Rejection, Coupling Strategy
Abstract: Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 65% robust test accuracy on CIFAR-10 without additional data, which is far from practical. A natural way to improve beyond this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. Intriguingly, we prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.
One-sentence Summary: We propose a rectified rejection module, which exploits a coupling rejection strategy to distinguish correctly and wrongly classified examples provably.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2105.14785/code)
5 Replies

Loading