Approximating Shapley Explanations in Reinforcement Learning

Daniel Beechey; Özgür Şimşek

Approximating Shapley Explanations in Reinforcement Learning

Daniel Beechey, Özgür Şimşek

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, Shapley values, explainable artificial intelligence, explainable reinforcement learning, feature-based explanations

TL;DR: FastSVERL provides a practical and scalable approach for principled, rigourous interpretability in reinforcement learning.

Abstract: Reinforcement learning has achieved remarkable success in complex decision-making environments, yet its lack of transparency limits its deployment in practice, especially in safety-critical settings. Shapley values from cooperative game theory provide a principled framework for explaining reinforcement learning; however, the computational cost of Shapley explanations is an obstacle for their use. We introduce FastSVERL, a scalable method for explaining reinforcement learning by approximating Shapley values. FastSVERL is designed to handle the unique challenges of reinforcement learning, including temporal dependencies across multi-step trajectories, learning from off-policy data, and adapting to evolving agent behaviours in real time. FastSVERL introduces a practical, scalable approach for principled and rigourous interpretability in reinforcement learning.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 20668

Loading