- TL;DR: We introduce DPFRL, a framework for reinforcement learning under partial and complex observations with a fully differentiable discriminative particle filter
- Abstract: Deep reinforcement learning has succeeded in sophisticated games such as Atari, Go, etc. Real-world decision making, however, often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for partial and complex observations. DPFRL encodes a differentiable particle filter with learned transition and observation models in a neural network, which allows for reasoning with partial observations over multiple time steps. While a standard particle filter relies on a generative observation model, DPFRL learns a discriminatively parameterized model that is training directly for decision making. We show that the discriminative parameterization results in significantly improved performance, especially for tasks with complex visual observations, because it circumvents the difficulty of modelling observations explicitly. In most cases, DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark that we introduce. We further show that DPFRL performs well for visual navigation with real-world data.
- Keywords: Reinforcement Learning, Partial Observability, Differentiable Particle Filtering