InfoDisent: Explainability of Image Classification Models by Information Disentanglement

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: interpretability; prototypical parts; XAI
TL;DR: Method transforming any NNs to prototypical parts, also on ImageNet
Abstract: In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. While InfoDisent achieves competitive performance within the class of interpretable models, we observe an accuracy-interpretability trade-off when compared to black-box counterparts, especially visible in CNNs. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 12074
Loading