Keywords: Explainability, interpretability, causality, causal artificial intelligence, mediation analysis, deep learning, representation learning
Abstract: Explainable deep learning models are important for the development, certification, and adoption of autonomous systems. Yet, without methods to quantify causal relationships between explanations and actions, interpretability remains correlational. Furthermore, explanations typically address lower-level actions. This poorly serves human understanding, which benefits from higher-level abstractions, and underactuated robotics, whose behaviors often require richer descriptions. To address these gaps, we introduce Causal Concept-Wrapper Network (CCW-Net), a general training method across differentiable architectures that adapts mediation analysis from fields such as economics, medicine, and epidemiology to align the causal effects of abstract, information-rich explanations with policy actions. CCW-Net expands the expressiveness of prior work in both explainable deep learning and mediation analysis allowing each explanation to serve as a mediator encoding both its presence and context-based expression. In a high-fidelity, underactuated aircraft formation task, CCW-Net produces high-level explanations that are both interpretable and quantifiably causal without degrading task performance. We demonstrate CCW-Net across diverse architectures including capsule networks with dynamic routing, modified concept bottleneck models, and cross-attention mechanisms. Notably, we present the first adaptation of capsule networks to sequential decision-making in robotics. This breadth shows that CCW-Net applies broadly across neural network architectures, offering a general path toward transparent and trustworthy autonomy.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 22254
Loading