Keywords: Causal interventions, Other
Other Keywords: Faithfulness, Explainable AI, Causal Abstraction
TL;DR: We propose desiderata for faithfulness and position Causal Abstraction.
Abstract: Faithfulness is a broadly agreed-upon desideratum for explanations of machine
learning (ML) model predictions. While many different methods have been adopted
by the community, there is no agreed-upon definition of faithfulness [1]. Here, we
propose desiderata for faithfulness beyond the standard intuition of “accurately
representing the reasoning process of the model" [2; 3]. We highlight a recently
introduced mechanistic interpretability (MI) framework, referred to as Causal
Abstraction (CA), and argue that CA provides a framework capable of aligning
faithfulness claims in the community.
Submission Number: 185
Loading