Spatio-Temporal Abstractions in Reinforcement Learning Through Neural Encoding

Nir Baram, Tom Zahavy, Shie Mannor

Nov 04, 2016 (modified: Dec 13, 2016) ICLR 2017 conference submission readers: everyone
  • Abstract: Recent progress in the field of Reinforcement Learning (RL) has enabled to tackle bigger and more challenging tasks. However, the increasing complexity of the problems, as well as the use of more sophisticated models such as Deep Neural Networks (DNN), impedes the understanding of artificial agents behavior. In this work, we present the Semi-Aggregated Markov Decision Process (SAMDP) model. The purpose of the SAMDP modeling is to describe and allow a better understanding of complex behaviors by identifying temporal and spatial abstractions. In contrast to other modeling approaches, SAMDP is built in a transformed state-space that encodes the dynamics of the problem. We show that working with the \emph{right} state representation mitigates the problem of finding spatial and temporal abstractions. We describe the process of building the SAMDP model from observed trajectories and give examples for using it in a toy problem and complicated DQN policies. Finally, we show how using the SAMDP we can monitor the policy at hand and make it more robust.
  • TL;DR: A method for understanding and improving deep agents by creating spatio-temporal abstractions
  • Conflicts:
  • Keywords: Reinforcement Learning, Deep learning