ProtoNMF: Turning a Black Box into a Prototype Based Interpretable Model via Non-negative Matrix Factorization

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: XAI, prototype based inherently interpretable model, non-negative matrix factorization
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Models using parts of images as prototypes for interpretable image classification are receiving increasing attention due to their abilities to provide a transparent reasoning process in a "this looks like that" manner. However, existing models are typically constructed by incorporating an additional prototype layer before the final classification head, which often involve complex multi-stage training procedures and intricate loss designs while under-performing their black box counterparts in terms of accuracy. In order to guarantee the recognition performance, we take the first step to explore the reverse direction and investigate how to turn a trained black box model into the form of a prototype based model. To this end, we propose to leverage the Non-negative Matrix Factorization (NMF) to discover interpretable prototypes due to its capability of yielding parts based representations. Then we use these prototypes as the basis to reconstruct the trained black box's classification head via linear convex optimization for transparent reasoning. Denote the reconstruction difference as the residual prototype, all discovered prototypes together guarantee a precise final reconstruction. To the best of our knowledge, this is the first prototype based model that guarantees the recognition performance on par with black boxes for interpretable image classification. We demonstrate that our simple strategy can easily turn a trained black box into a prototype based model while discovering meaningful prototypes in various benchmark datasets and networks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1839
Loading