Keywords: Data Manifold, Adversarial
TL;DR: Experiments using PCA to define a linear data manifold and study properties of it
Abstract: In this work we study adversarial examples in deep neural networks through the lens of a predefined data manifold.
By forcing certain geometric properties of this manifold, we are able to analyze the behavior of the learned decision boundaries.
It has been shown previously that training to be robust against adversarial attacks produces models with gradients aligned to a small set of principal variations in the data. We demonstrate the converse of this statement; aligning model gradients with a select set of principal variations improves robustness against gradient based adversarial attacks. Our analysis shows that this also makes data more orthogonal to decision boundaries. We conclude that robust training methods make the problem better posed by focusing the model on more important dimensions of variation.
Type Of Submission: Extended Abstract (4 pages, non-archival)
Submission Number: 76
Loading