Keywords: interpretability, deep learning
Abstract: Despite the successes of deep neural networks (DNNs) on a broad range of tasks little has been understood of why and how they achieve such victories due to their complex architecture and their opaque black-box training processes. With the goal to unveil the mystery of DNNs, in this work, we propose a general framework called Concept-Monitor to uncover the black-box DNN training processes automatically for the first time. Our proposed Concept-Monitor enables human-interpretable visualization on the DNN training processes and thus facilitates transparency as well as deeper understanding of how DNNs function and operate along the training iterations. Using Concept-Monitor, we are able to observe and compare different training paradigms at ease, including standard training, fine-tuning, adversarial training and network pruning for Lottery Ticket Hypothesis, which brings new insights on why and how adversarial training and network pruning work and how they modify the network during training. For example, we find that the lottery ticket hypothesis discovers a mask that makes neurons interpretable at initialization, \textit{without} any finetuning, and we also found that adversarially robust models have more neurons relying on color as compared to standard models trained on the same dataset.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
6 Replies
Loading