Explaining Adversarial Examples with Knowledge RepresentationDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone
Abstract: Adversarial examples are modified samples that preserve original image structures but deviate classifiers. Researchers have put efforts into developing methods for generating adversarial examples and finding out origins. Past research put much attention on decision boundary changes caused by these methods. This paper, in contrast, discusses the origin of adversarial examples from a more underlying knowledge representation point of view. Human beings can learn and classify prototypes as well as transformations of objects. While neural networks store learned knowledge in a more hybrid way of combining all prototypes and transformations as a whole distribution. Hybrid storage may lead to lower distances between different classes so that small modifications can mislead the classifier. A one-step distribution imitation method is designed to imitate distribution of the nearest different class neighbor. Experiments show that simply by imitating distributions from a training set without any knowledge of the classifier can still lead to obvious impacts on classification results from deep networks. It also implies that adversarial examples can be in more forms than small perturbations. Potential ways of alleviating adversarial examples are discussed from the representation point of view. The first path is to change the encoding of data sent to the training step. Training data that are more prototypical can help seize more robust and accurate structural knowledge. The second path requires constructing learning frameworks with improved representations.
Keywords: adversarial example, knowledge representation, distribution imitation
TL;DR: Hybird storage and representation of learned knowledge may be a reason for adversarial examples.
4 Replies

Loading