Adversarial Training with Rectified Rejection

Tianyu Pang; Huishuai Zhang; Di He; Yinpeng Dong; Hang Su; Wei Chen; Jun Zhu; Tie-Yan Liu

Adversarial Training with Rectified Rejection

Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

29 Sept 2021 (modified: 08 Jun 2025)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Adversarial Training, Rectified Rejection, Coupling Strategy

Abstract: Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 65% robust test accuracy on CIFAR-10 without additional data, which is far from practical. A natural way to improve beyond this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. Intriguingly, we prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.

One-sentence Summary: We propose a rectified rejection module, which exploits a coupling rejection strategy to distinguish correctly and wrongly classified examples provably.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/adversarial-training-with-rectified-rejection/code)

5 Replies

Loading