Abstract: As deep reinforcement learning is being applied to more and more tasks, there is a growing need to better understand and probe the learned agents. Visualizing and understanding the decision making process can be very valuable to comprehend and identify problems in the learned behavior. However, this topic has been relatively under-explored in the reinforcement learning community. In this work we present a method for synthesizing states of interest for a trained agent. Such states could be situations (e.g. crashing or damaging a car) in which specific actions are necessary. Further, critical states in which a very high or a very low reward can be achieved (e.g. risky states) are often interesting to understand the situational awareness of the system. To this end, we learn a generative model over the state space of the environment and use its latent space to optimize a target function for the state of interest. In our experiments we show that this method can generate insightful visualizations for a variety of environments and reinforcement learning methods. We explore these issues in the standard Atari benchmark games as well as in an autonomous driving simulator. Based on the efficiency with which we have been able to identify significant decision scenarios with this technique, we believe this general approach could serve as an important tool for AI safety applications.
Keywords: Visualization, Deep Reinforcement Learning
TL;DR: We present a method to synthesize states of interest for reinforcement learning agents in order to analyze their behavior.