Abstract: A white noise analysis of modern deep neural networks is presented to unveil
their biases at the whole network level or the single neuron level. Our analysis is
based on two popular and related methods in psychophysics and neurophysiology
namely classification images and spike triggered analysis. These methods have
been widely used to understand the underlying mechanisms of sensory systems
in humans and monkeys. We leverage them to investigate the inherent biases of
deep neural networks and to obtain a first-order approximation of their functionality.
We emphasize on CNNs since they are currently the state of the art methods
in computer vision and are a decent model of human visual processing. In
addition, we study multi-layer perceptrons, logistic regression, and recurrent neural
networks. Experiments over four classic datasets, MNIST, Fashion-MNIST,
CIFAR-10, and ImageNet, show that the computed bias maps resemble the target
classes and when used for classification lead to an over two-fold performance than
the chance level. Further, we show that classification images can be used to attack
a black-box classifier and to detect adversarial patch attacks. Finally, we utilize
spike triggered averaging to derive the filters of CNNs and explore how the behavior
of a network changes when neurons in different layers are modulated. Our
effort illustrates a successful example of borrowing from neurosciences to study
ANNs and highlights the importance of cross-fertilization and synergy across machine
learning, deep learning, and computational neuroscience.
Keywords: Classification images, spike triggered analysis, deep learning, network visualization, adversarial attack, adversarial defense, microstimulation, computational neuroscience
Code: https://github.com/aliborji/WhiteNoiseAnalysis.git
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:1912.12106/code)
Original Pdf: pdf
12 Replies
Loading