TL;DR: We address the challenge of explaining the total counterfactual effect of an agent’s action on the outcome of a realized scenario in multi-agent Markov decision processes.
Abstract: We address the challenge of explaining counterfactual outcomes in multi-agent Markov decision processes. In particular, we aim to explain the total counterfactual effect of an agent's action on the outcome of a realized scenario through its influence on the environment dynamics and the agents' behavior. To achieve this, we introduce a novel causal explanation formula that decomposes the counterfactual effect by attributing to each agent and state variable a score reflecting their respective contributions to the effect. First, we show that the total counterfactual effect of an agent's action can be decomposed into two components: one measuring the effect that propagates through all subsequent agents' actions and another related to the effect that propagates through the state transitions. Building on recent advancements in causal contribution analysis, we further decompose these two effects as follows. For the former, we consider agent-specific effects -- a causal concept that quantifies the counterfactual effect of an agent's action that propagates through a subset of agents. Based on this notion, we use Shapley value to attribute the effect to individual agents. For the latter, we consider the concept of structure-preserving interventions and attribute the effect to state variables based on their "intrinsic'' contributions. Through extensive experimentation, we demonstrate the interpretability of our approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator.
Lay Summary: When a decision-making system fails, it is important to understand what went wrong and why. A common approach to this problem involves estimating the probability that the system would not have failed if a particular decision had been different. This quantity is known as the counterfactual effect, and it captures how pivotal a specific decision was to the failure. While informative, counterfactual effects can be difficult to interpret in complex systems where multiple agents make decisions over time. To our knowledge, our work is the first to address this interpretability challenge in such multi-agent sequential decision making settings. We introduce a systematic approach to decomposing the effect of an agent’s decision into its influence through other agents and through the underlying environment. The result is a set of scores attributed to each agent and environment state, reflecting their contribution to the counterfactual effect under analysis. Our method can be integrated with existing causal analysis tools to retrospectively analyze failures, offering more nuanced explanations and more principled judgments of accountability.
Link To Code: https://github.com/stelios30/cf-effect-decomposition
Primary Area: General Machine Learning->Causality
Keywords: counterfactual reasoning, causal explanation formula, multi-agent Markov decision processes, accountability
Submission Number: 15708
Loading