Training Interpretable Convolutional Neural Networks towards Class-specific Filters

Anonymous

Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Convolutional neural networks (CNNs) have often been treated as “black-box” and successfully used in a range of tasks. However, CNNs still suffer from the problem of filter ambiguity – an intricate many-to-many mapping relationship between filters and features, which undermines the models’ interpretability. To interpret CNNs, most existing works attempt to interpret a pre-trained model, while neglecting to reduce the filter ambiguity hidden behind. To this end, we propose a simple but effective strategy for training interpretable CNNs. Specifically, we propose a novel Label Sensitive Gate (LSG) structure to enable the model to learn disentangled filters in a supervised manner, in which redundant channels experience a periodical shutdown as flowing through a learnable gate varying with input labels. To reduce redundant filters during training, LSG is constrained with a sparsity regularization. In this way, such training strategy imposes each filter’s attention to just one or few classes, namely class-specific. Extensive experiments demonstrate the fabulous performance of our method in generating sparse and highly label- related representation of the input. Moreover, comparing to the standard training strategy, our model displays less redundancy and stronger interpretability.
  • Keywords: class-specific filters, interpretability, disentangled representation, filter ambiguity, gate
0 Replies

Loading