Keywords: Explainable Artificial Intelligence, Concept-based Explainability, Concept Discovery, Concept Hierarchy, Concept Bottleneck Models, Concept Embedding Models, Sparse Autoencoders
Abstract: Concept-based models explain predictions using human-understandable concepts, but they typically rely on exhaustive annotations and treat concepts as flat and independent. To circumvent this, recent work has introduced *Hierarchical Concept Embedding Models* (HiCEMs) to explicitly model concept relationships, and *Concept Splitting* to discover sub-concepts using only coarse annotations. However, both methods are restricted to shallow hierarchies. We overcome this limitation with *Multi-Level Concept Splitting* (MLCS), which discovers multi-level concept hierarchies from only top-level supervision, and *Deep-HiCEMs*, an architecture that represents these discovered hierarchies and enables interventions at multiple levels of abstraction. Experiments show that MLCS discovers human-interpretable concepts absent during training and that Deep-HiCEMs maintain high accuracy while supporting test-time concept interventions that can improve task performance.
Submission Number: 26
Loading