Abstract: This study aims to reproduce and extend the research on Discover-Then-Name Concept
Bottleneck Models (DN-CBM) introduced by Rao et al. (2024). DN-CBM enhances tra-
ditional CBM models by incorporating sparse autoencoders (SAEs) to enable automatic
concept discovery and improved concept generation and interpretability. We replicate the
key experiments on CIFAR-10, CIFAR-100, Places365, and ImageNet, confirming the claims
of automated concept discovery, task-agnostic applicability, and improved vocabulary lead-
ing to greater granularity. However, we find that the claim of superior interpretability over
CLIP is inconclusive. Beyond replication, we introduce new experiments, including an anal-
ysis of color perturbations on concept robustness and the integration of Local Interpretable
Model-Agnostic Explanations (LIME) to trace which features correspond to each concept.
Our findings reveal the model’s limited robustness to color variations and demonstrate how
adding LIME results in increased interpretability and the ability to detect (spurious) corre-
lations. The complete implementation of the original authors experiments as well as ours is
available in our repository: https://github.com/EKarasevnl/Reproducibility-DN-CBM.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission:
Addressed received feedback and clarified concerns regarding LIME. Strengthened the paper's conclusion.
Assigned Action Editor: Georgios Leontidis
Submission Number: 4280
Loading