TL;DR: We introduce layerwise origin-target synthesis (LOTS) that can be used for visualizing internal representations of deep neural networks, and for adversarial example generation.
Abstract: Deep neural networks have recently demonstrated excellent performance on various tasks. Despite recent advances, our understanding of these learning models is still incomplete, at least, as their unexpected vulnerability to imperceptibly small, non-random perturbations revealed. The existence of these so-called adversarial examples presents a serious problem of the application of vulnerable machine learning models. In this paper, we introduce the layerwise origin-target synthesis (LOTS) that can serve multiple purposes. First, we can use it as a visualization technique that gives us insights into the function of any intermediate feature layer by showing the notion of a particular input in deep neural networks. Second, our approach can be applied to assess the invariance of the learned features captured at any layer with respect to the class of the particular input. Third, we can also utilize LOTS as a general way of producing a vast amount of diverse adversarial examples that can be used for training to further improve the robustness and performance of machine learning models. However, we show that the improvement with respect to adversarial robustness to different adversarial types varies and highly depends on the type used for training - some adversarial generation techniques can be almost completely immune to adversarial training.
Conflicts: uccs.edu, idiap.ch
0 Replies
Loading