Keywords: Benchmarking Interpretability, Other
Other Keywords: causal abstraction, explanation, complex systems, benchmark
TL;DR: We benchmark 30+ explanation metrics across simulated complex systems and show that only faithfulness-aware causal abstraction reliably detects invalid mechanistic explanations, introducing CAE as a robust, sample-efficient metric.
Abstract: A central goal of science is to produce valid explanations of complex systems: high-level causal accounts that faithfully reflect the behavior of lower-level mechanisms. Yet no consensus exists on how to measure whether a proposed high-level explanation is actually valid. We introduce a benchmark of ten complex systems spanning discrete and continuous state spaces and static and dynamical regimes, each equipped with consensual ground-truth causal explanations and invalid contrastive conditions. Within a unified causal abstraction framework, we systematically evaluate over thirty candidate metrics drawn from observational, functional, information-theoretic, and causal families. Our results show that only the latter reliably discriminates valid from invalid abstractions, and only when incorporating faithfulness testing over unmapped variables. Building on these findings, we introduce the Causal Abstraction Error (CAE), a continuous validity metric with an explicit faithfulness test, which passes all discrimination tests across every system and converges with as few as 30 sampled interventions. We offer it as a general-purpose metric for the discovery and validation of high-level explanations of complex systems.
Submission Number: 318
Loading