Controlling Over-generalization and its Effect on Adversarial Examples Detection and Generation

Mahdieh Abbasi; Arezoo Rajabi; Azadeh Sadat Mozafari; Rakesh B. Bobba; Christian Gagné

Controlling Over-generalization and its Effect on Adversarial Examples Detection and Generation

Mahdieh Abbasi, Arezoo Rajabi, Azadeh Sadat Mozafari, Rakesh B. Bobba, Christian Gagné

27 Sept 2018 (modified: 22 Jun 2025)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Convolutional Neural Networks (CNNs) significantly improve the state-of-the-art for many applications, especially in computer vision. However, CNNs still suffer from a tendency to confidently classify out-distribution samples from unknown classes into pre-defined known classes. Further, they are also vulnerable to adversarial examples. We are relating these two issues through the tendency of CNNs to over-generalize for areas of the input space not covered well by the training set. We show that a CNN augmented with an extra output class can act as a simple yet effective end-to-end model for controlling over-generalization. As an appropriate training set for the extra class, we introduce two resources that are computationally efficient to obtain: a representative natural out-distribution set and interpolated in-distribution samples. To help select a representative natural out-distribution set among available ones, we propose a simple measurement to assess an out-distribution set's fitness. We also demonstrate that training such an augmented CNN with representative out-distribution natural datasets and some interpolated samples allows it to better handle a wide range of unseen out-distribution samples and black-box adversarial examples without training it on any adversaries. Finally, we show that generation of white-box adversarial attacks using our proposed augmented CNN can become harder, as the attack algorithms have to get around the rejection regions when generating actual adversaries.

Keywords: Convolutional Neural Networks, Adversarial Instances, Out-distribution Samples, Rejection Option, Over-generalization

TL;DR: Properly training CNNs with dustbin class increase their robustness to adversarial attacks and their capacity to deal with out-distribution samples.

Data: [MNIST](https://paperswithcode.com/dataset/mnist)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/controlling-over-generalization-and-its/code)

18 Replies

Loading