Keywords: evaluation of interpretability, feature visualization, activation maximization, human psychophysics, understanding CNNs, explanation method
TL;DR: Using human psychophysical experiments, we show that natural images can be significantly more informative for interpreting neural network activations than synthetic feature visualizations.
Abstract: Feature visualizations such as synthetic maximally activating images are a widely used explanation method to better understand the information processing of convo- lutional neural networks (CNNs). At the same time, there are concerns that these visualizations might not accurately represent CNNs’ inner workings. Here, we measure how much extremely activating images help humans in predicting CNN activations. Using a well-controlled psychophysical paradigm, we compare the informativeness of synthetic images by Olah et al.  with a simple baseline visualization, namely natural images that also strongly activate a specific feature map. Given either synthetic or natural reference images, human participants choose which of two query images leads to strong positive activation. The experiment is designed to maximize participants’ performance, and is the first to probe interme- diate instead of final layer representations. We find that synthetic images indeed provide helpful information about feature map activations (82 ± 4% accuracy; chance would be 50%). However, natural images—originally intended to be a baseline—outperform these synthetic images by a wide margin (92 ± 2% accuracy). The superiority of natural images holds across the investigated network and various conditions. Therefore, we argue that visualization methods should improve over this simple baseline.
Supplementary Material: zip