Actionable Interpretability Must Be Defined in Terms of Symmetries: A Compositional Probabilistic Approach
Keywords: Interpretability, category theory, probabilistic machine learning
TL;DR: Interpretability lacks theoretical foundations; we posit that for a definition of interpretability to be actionable it must be given in terms of symmetries
Abstract: This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of **symmetries** that inform model design and lead to testable conditions.
Under a **compositional** view, we hypothesise that four symmetries (inference equivariance, information invariance, concept-closure invariance, and structural invariance) suffice to (i) formalise interpretable models as a subclass of probabilistic models, (ii) yield a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion, and (iii) provide a formal framework to verify compliance with safety standards and regulations.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 52
Loading