Abstract: We study how to explain the main steps of inference that a pre-trained deep neural net (DNN) relies on to produce predictions for a (sub)task and its data.
This problem is related to network pruning and interpretable machine learning with the following highlighted differences: (1) fine-tuning of any neurons/filters is forbidden; (2) we target a very high pruning rate, e.g., ≥ 95%, for better interpretability; (3) the interpretation is for the whole inference process on a few data of a task rather than for individual neurons/filters or a single sample.
In this paper, we introduce NeuroChains to extract the local inference chains by optimizing differentiable sparse scores for the filters and layers, which reflects their importance in preserving the outputs on a few data drawn from a given (sub)task.
Thereby, NeuroChains can extract an extremely small sub-network composed of critical filters exactly copied from the original pre-trained DNN by removing the filters/layers with small scores.
For samples from the same class, we can then visualize the inference pathway in the pre-trained DNN by applying existing interpretation techniques to the retained filters and layers.
It reveals how the inference process stitches and integrates the information layer by layer and filter by filter.
We provide detailed and insightful case studies together with several quantitative analyses over thousands of trials to demonstrate the quality, sparsity, fidelity and accuracy of the interpretation. In extensive empirical studies on VGG, ResNet, and ViT, NeuroChains significantly enriches the interpretation and makes the inner mechanism of DNNs more transparent.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=zJt4WDcBhu&referrer=%5BTMLR%5D(%2Fgroup%3Fid%3DTMLR)
Changes Since Last Submission: 1. The experiments of applying NeuroChains to ViT are added to the main text.
Assigned Action Editor: ~Stanislaw_Kamil_Jastrzebski1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 403
Loading