Robustness May Be at Odds with Accuracy

Dimitris Tsipras; Shibani Santurkar; Logan Engstrom; Alexander Turner; Aleksander Madry

Robustness May Be at Odds with Accuracy

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry

Published: 21 Dec 2018, Last Modified: 22 Jun 2025ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We show that there exists an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists even in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed in practice. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the features learned by robust models tend to align better with salient data characteristics and human perception.

Keywords: adversarial examples, robust machine learning, robust optimization, deep feature representations

TL;DR: We show that adversarial robustness might come at the cost of standard classification performance, but also yields unexpected benefits.

Code: [![Papers with Code](/images/pwc_icon.svg) 7 community implementations](https://paperswithcode.com/paper/?openreview=SyxAb30cY7)

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 9 code implementations](https://www.catalyzex.com/paper/robustness-may-be-at-odds-with-accuracy/code)

9 Replies

Loading