Unveiling Hidden Biases in Deep Networks with Classification Images and Spike Triggered Analysis

Anonymous

Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Classification images and spike triggered analysis have been widely used in psychophysics and neurophysiology to understand underlying mechanisms of sensory systems in humans and monkeys. In this paper, we leverage these techniques to investigate the inherent biases of deep neural networks and to obtain a first-order approximation of their functionality. We emphasize on convolutional neural networks (CNNs) since they are currently the state of the art methods in computer vision and are a good model of human visual processing. In addition, we also study multi-layer perceptrons, logistic regression and recurrent neural networks. Experimenting over three classic datasets, MNIST, Fashion-MNIST and CIFAR-10, we show that the computed bias maps resemble the target classes and when used for classification lead to an over two-fold performance than the chance. Further, we show that classification images can be used to attack a black box classifier and to detect adversarial patch attacks. Finally, we utilize spike triggered averaging to derive filters of CNNs and explore how the behavior of a network changes when neurons in different layers are modulated. Our effort illustrates a successful example of borrowing from neuroscience to study artificial neural networks and highlights the importance of cross-fertilization and synergy across machine learning, deep learning, and computational neuroscience.
  • Keywords: Classification images, spike triggered analysis, deep learning, network visualization, adversarial attack, adversarial defense, microstimulation, computational neuroscience
0 Replies

Loading