CAuSE: Post-hoc Natural Language Explanation of Multimodal Classifiers through Causal Abstraction

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability, Causal Abstraction, Multimodality, Classification
Abstract: The increasing integration of AI models in critical areas, such as healthcare, finance, and security has raised concerns about their black-box nature, limiting trust and accountability. To ensure robust and trustworthy AI, interpretability is essential. In this paper, we propose CAuSE (Causal Abstraction under Simulated Explanation), a novel framework for post-hoc explanation of multimodal classifiers. Unlike existing interpretability methods, such as Amnesic Probing and Integrated Gradients, CAuSE generates causally faithful natural language explanations of fine-tuned multimodal classifiers' decisions. CAuSE integrates Interchange Intervention Training (IIT) within a Language Model (LM) based module to simulate the causal reasoning behind a classifier's outputs. We introduce a novel metric Counterfactual F1 score to measure causal faithfulness and demonstrate that CAuSE achieves state-of-the-art performance on this metric. We also provide a rigorous theoretical underpinning for causal abstraction between two neural networks and implement this within our CAuSE framework. This ensures that CAuSE’s natural language explanations are not only simulations of the classifier’s behavior but also reflect its underlying causal processes. Our method is task-agnostic and achieves state-of-the-art results on benchmark multimodal classification datasets, such as e-SNLI-VE and Facebook Hateful Memes, offering a scalable, faithful solution for interpretability in multimodal classifiers.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14003
Loading