Signed-dictionary and Nonnegative-activation Decomposition for Concept Bottleneck Models

06 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: interpretability, Concept Bottleneck Model
Abstract: Concept-driven interpretability often relies on a fixed text pool, which limits coverage of fine-grained and compositional concepts and weakens the coupling between explanations and decisions. We introduce SAND-CBM, a label-free framework that learns concepts directly from image representations in an aligned vision–language space. SAND-CBM factorizes features into a signed concept dictionary $W$ and nonnegative activations $U$, then applies a scale-equivalent normalization that maps each activation column to $[0,1]$ for comparable strength across concepts. A class-conditional sparse gate enables per-class selection over a shared dictionary, supporting reuse without per-class redundancy. On top of the same $(U,W)$, we expose two lightweight and complementary usage modes: Branch-A concatenates image–text similarities with $U$ in a CBM-style interface, while \emph{Branch-B} concatenates back-mapped reconstructions $Z{=}UW^*$ with $U$ in a CEM-style interface. Across CIFAR-100, CUB, and SUN, SAND-CBM attains 80.52\%, 80.76\%, and 67.64\% Acc@1, respectively, yielding an average gain of 10.14\% over all baselines. Our code is available at: https://anonymous.4open.science/r/SAND-FA73/
Primary Area: interpretability and explainable AI
Submission Number: 2656
Loading