Building the dual graph of the activation regions in a deep neural network: what it means for interpretability
Keywords: activation regions, interpretability, generalization
Abstract: Understanding the geometric representations of deep neural networks (DNNs) which employ a piecewise linear activation function has become a popular research direction for model explainability.
A complete geometric picture of the representations of a DNN would include both the polytope regions formed by the network partitions and the set of neighboring regions, i.e., a dual graph.
Prior work has resulted in algorithms which enumerate all of the activation regions formed by a network, but no algorithms have been proposed for constructing the dual graph in its entirety.
This gap may stem from the naive assumption that because identifying neighboring regions is trivial in shallow networks, it is also trivial in deep networks.
In this work, we demonstrate that this assumption is false; finding neighboring regions in a deep network is in fact a difficult problem due to the conditional nature of the partitions in the deep layers.
We introduce a method to solve the difficult problem of neighbor finding in DNNs.
We implement this algorithm along with region enumeration, which together constructs the dual graph.
Further, we demonstrate the usefulness of the graph in the context of generalization.
We show that test data that are near training data, as measured by path length along the graph, tend to yield the best generalization results.
Primary Area: interpretability and explainable AI
Submission Number: 22776
Loading