Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, the properties of subspaces in the neighborhood of adversarial examples need to be characterized. In particular, effective measures are required to discriminate adversarial examples from normal examples in such subspaces. We tackle this challenge by characterizing the intrinsic dimensional property of adversarial subspaces, via the use of Local Intrinsic Dimensionality. LID assesses the space-filling capability of the subspace surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation affects the LID characteristic of adversarial subspaces. Then, we explain how the LID characteristic can be used to discriminate adversarial examples generated using the state-of-the-art attacks. We empirically show that the LID characteristic can outperform several state-of-the-art detection measures by large margins for five attacks across three benchmark datasets. Our analysis of the LID characteristic for adversarial subspaces not only motivates new directions of effective adversarial defense but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.
  • TL;DR: We characterize the intrinsic dimensional property of adversarial subspaces in the neighborhood of adversarial examples, via the use of Local Intrinsic Dimensionality (LID) and empirically show that such characteristics can discriminate adversarial examples effectively.
  • Keywords: Adversarial Subspace, Local Intrinsic Dimensionality, Adversarial Detection, Adversarial Defense, Deep Neural Networks