- Keywords: Interpretability Evaluation, Deep Neural Networks, Alternating Direction Method of Multipliers
- TL;DR: We propose a novel framework to evaluate the interpretability of neural network.
- Abstract: Deep neural networks (DNNs) have attained surprising achievement during the last decade due to the advantages of automatic feature learning and freedom of expressiveness. However, their interpretability remains mysterious because DNNs are complex combinations of linear and nonlinear transformations. Even though many models have been proposed to explore the interpretability of DNNs, several challenges remain unsolved: 1) The lack of interpretability quantity measures for DNNs, 2) the lack of theory for stability of DNNs, and 3) the difficulty to solve nonconvex DNN problems with interpretability constraints. To address these challenges simultaneously, this paper presents a novel intrinsic interpretability evaluation framework for DNNs. Specifically, Four independent properties of interpretability are defined based on existing works. Moreover, we investigate the theory for the stability of DNNs, which is an important aspect of interpretability, and prove that DNNs are generally stable given different activation functions. Finally, an extended version of deep learning Alternating Direction Method of Multipliers (dlADMM) are proposed to solve DNN problems with interpretability constraints efficiently and accurately. Extensive experiments on several benchmark datasets validate several DNNs by our proposed interpretability framework.