Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

Ziyang Guo; Berk Ustun; Jessica Hullman

Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

Ziyang Guo, Berk Ustun, Jessica Hullman

Published: 04 Jun 2026, Last Modified: 04 Jun 2026PhilML@ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainability, decision theory, interpretability

Abstract: Explanations of model behavior are commonly evaluated via proxy properties weakly tied to the purposes explanations serve in practice. We contribute a decision theoretic framework that treats explanations as information signals valued by the expected improvement they enable on a specified decision task. This approach yields three distinct estimands: (i) a theoretical benchmark that upper-bounds achievable performance by any agent with the explanation, (ii) a human-complementary value that quantifies the theoretically attainable value that is not already captured by a baseline human decision policy, and (iii) a behavioral value representing the causal effect of providing the explanation to human decision-makers. We instantiate these definitions in a practical validation workflow, and apply them to assess explanation potential and interpret behavioral effects in human–AI decision support and mechanistic interpretability.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 72

Loading