Outcome-Guided Counterfactuals from a Jointly Trained Generative Latent Space

Eric Yeh, Pedro Sequeira, Jesse Hostetler, Melinda T. Gervasio

Published: 01 Jan 2023, Last Modified: 04 Sept 2024xAI (3) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We present a novel generative method for producing higher-quality counterfactual examples for decision processes using a latent space that jointly encodes observations and associated behavioral outcome variables, such as classification decisions, actions taken, and estimated values. Our approach trains a variational autoencoder over behavior traces to both reconstruct observations and predict the outcome variables from the latent encoding. The resulting joint observation and outcome latent allows for unconditioned sampling of both observations and outcome variables from this space. This grants us the ability to generate counterfactuals using multiple methods, such as gradient-driven updates to move towards desired outcomes, interpolations against relevant cases drawn from a memory of examples, and combinations of these two. This also permits us to sample counterfactuals where constraints can be placed over some outcome variables, while others are allowed to vary. This flexibility also permits us to directly address the plausibility of generated counterfactuals by using gradient-driven updates to raise the data-likelihood of generated examples. We use this method to analyze the behavior of reinforcement learning (RL) agents against several outcome variables that characterize agent behavior. From experiments in three different RL environments, we show that these methods produce counterfactuals that score higher on standard counterfactual quality measures of proximity to the query and plausibility in contrast to observation-only gradient updates and case-based baselines. We also empirically demonstrate that counterfactuals sampled from a jointly trained space are of higher quality than those from the common practice of using latents from reconstruction-only autoencoders. We conclude with an analysis of counterfactuals produced over the joint latent using combinations of latent and case-based approaches for an agent trained to play a complex real-time strategy game, and discuss future directions of investigation for this approach.