Revisiting Discover-then-Name Concept Bottleneck Models: A Reproducibility Study

Freek Byrman; Emma Kasteleyn; Bart Kuipers; Daniel Uyterlinde

Revisiting Discover-then-Name Concept Bottleneck Models: A Reproducibility Study

Freek Byrman, Emma Kasteleyn, Bart Kuipers, Daniel Uyterlinde

Published: 12 Jun 2025, Last Modified: 12 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Concept Bottleneck Models (CBMs) (Koh et al., 2020) are a class of interpretable deep learning frameworks that improve transparency by mapping input data into human-understandable concepts. Recent advances, including the Discover-then-Name CBM proposed by Rao et al. (2024), eliminate reliance on external language models by automating concept discovery and naming using a CLIP feature extractor and sparse autoencoder. This study focuses on replicating the key findings reported by Rao et al. (2024). We conclude that the core conceptual ideas are reproducible, but not to the extent presented in the original work. Many representations of active neurons appear to be misaligned with their assigned concepts, indicating a lack of faithfulness of the DN-CBM’s explanations. To address this, we propose a model extension: an enhanced alignment method that we evaluate through a user study. Our extended model provides more interpretable concepts (with statistical significance), at the cost of a slight decrease in accuracy.

Certifications: Reproducibility Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/daniuyter/DNCBM-repro

Supplementary Material: zip

Assigned Action Editor: ~Sungsoo_Ahn1

Submission Number: 4302

Loading