Discover-then-Name Revisited: Enhancing Concept Bottle- Neck Models Interpretability

TMLR Paper4280 Authors

21 Feb 2025 (modified: 09 May 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This study aims to reproduce and extend the research on Discover-Then-Name Concept Bottleneck Models (DN-CBM) introduced by Rao et al. (2024). DN-CBM enhances tra- ditional CBM models by incorporating sparse autoencoders (SAEs) to enable automatic concept discovery and improved concept generation and interpretability. We replicate the key experiments on CIFAR-10, CIFAR-100, Places365, and ImageNet, confirming the claims of automated concept discovery, task-agnostic applicability, and improved vocabulary lead- ing to greater granularity. However, we find that the claim of superior interpretability over CLIP is inconclusive. Beyond replication, we introduce new experiments, including an anal- ysis of color perturbations on concept robustness and the integration of Local Interpretable Model-Agnostic Explanations (LIME) to trace which features correspond to each concept. Our findings reveal the model’s limited robustness to color variations and demonstrate how adding LIME results in increased interpretability and the ability to detect (spurious) corre- lations. The complete implementation of the original authors experiments as well as ours is available in our repository: https://github.com/EKarasevnl/Reproducibility-DN-CBM.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission:

Addressed received feedback and clarified concerns regarding LIME. Strengthened the paper's conclusion.

Assigned Action Editor: Georgios Leontidis
Submission Number: 4280
Loading