Integrating Policy Summaries with Reward Decomposition Explanations

Yael Septon; Ofra Amir

Integrating Policy Summaries with Reward Decomposition Explanations

Yael Septon, Ofra Amir

Published: 30 Apr 2022, Last Modified: 05 May 2023XAIP 2022Readers: Everyone

Keywords: Explainable AI, Strategy Summarization, Reinforcement Learning, Deep Learning, Reward Decomposition

TL;DR: Combination of reward decomposition, a local explanation method that exposes agent preferences, with HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in "important'' states.

Abstract: Explaining the behavior of agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed reward. In this paper, we study a new way of combining local and global explanations of sequential decision-making agents in order to help understand their behavior. Specifically, we combine reward decomposition, a local explanation method that exposes agent preferences, with HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in ``important'' states. We conducted a user study to evaluate the integration of these explanation methods and their respective benefits. Our results show that local information in the form of reward decomposition contributed to participants' understanding of agents' preferences, while HIGHLIGHTS summaries did not lead to an improvement compared to a baseline showing frequent agent trajectories.

4 Replies

Loading