Outcome-Guided Counterfactuals from a Jointly Trained Generative Latent Space

Eric Yeh, Pedro Sequeira, Jesse Hostetler, Melinda Gervasio

Published: 26 Jul 2023, Last Modified: 12 Sept 2024XAI WorldEveryoneCC BY-NC-ND 4.0

Abstract: We present a novel generative method for producing higher-quality counterfactual examples for decision processes using a latent space that jointly encodes observations and associated behavioral outcome variables, such as clas- sification decisions, actions taken, and estimated values. Our approach trains a variational autoencoder over behavior traces to both reconstruct observations and predict the outcome variables from the latent encoding. The resulting joint ob- servation and outcome latent allows for unconditioned sampling of both obser- vations and outcome variables from this space. This grants us the ability to gen- erate counterfactuals using multiple methods, such as gradient-driven updates to move towards desired outcomes, interpolations against relevant cases drawn from a memory of examples, and combinations of these two. This also permits us to sample counterfactuals where constraints can be placed over some outcome vari- ables, while others are allowed to vary. This flexibility also permits us to directly address the plausibility of generated counterfactuals by using gradient-driven up- dates to raise the data-likelihood of generated examples. We use this method to analyze the behavior of reinforcement learning (RL) agents against several out- come variables that characterize agent behavior. From experiments in three dif- ferent RL environments, we show that these methods produce counterfactuals that score higher on standard counterfactual quality measures of proximity to the query and plausibility in contrast to observation-only gradient updates and case- based baselines. We also empirically demonstrate that counterfactuals sampled from a jointly trained space are of higher quality than those from the common practice of using latents from reconstruction-only autoencoders. We conclude with an analysis of counterfactuals produced over the joint latent using combi- nations of latent and case-based approaches for an agent trained to play a com- plex real-time strategy game, and discuss future directions of investigation for this approach.