Representation Quality Of Neural Networks Links To Adversarial Attacks and Defences

Shashank Kotyan; Moe Matsuki; Danilo Vasconcellos Vargas

Representation Quality Of Neural Networks Links To Adversarial Attacks and Defences

Shashank Kotyan, Moe Matsuki, Danilo Vasconcellos Vargas

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Understanding Neural Networks, Representation Metrics, Adversarial Machine Learning, Adversarial Attacks, Adversarial Defences

Abstract: Neural networks have been shown vulnerable to a variety of adversarial algorithms. A crucial step for understanding the rationale behind this lack of robustness is to assess the potential of the neural networks' representation to encode the existing features. Here, we propose a method to understand the representation quality of the neural networks using a novel test based on Zero-Shot Learning, entitled Raw Zero-Shot. The principal idea is that if an algorithm learns rich features, such features should be able to interpret 'new or unknown' classes as a combination of previously learned features. This is because unknown classes usually share several regular features with recognised (learned) classes, given that the features learned are general enough. We further introduce two metrics to assess this learned representation which interprets unknown classes. One is based on inter-cluster validation technique, while the other is based on the difference in the representation between the case when the class is unknown and the case when it is known to the classifier. Experiments suggest that several adversarial defences not only decrease the attack accuracy of some attacks but also improve the representation quality of the classifiers. Further, a low p-value of the paired-samples t-test suggests that several adversarial defences, in general, change the representation quality significantly. Moreover, experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson Correlation Coefficient (PCC) and low p-value).

One-sentence Summary: The article links the representation quality of the neural networks evaluated using unknown classes with adversarial attacks and defences.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=f6BA5TkK8I

6 Replies

Loading