- Keywords: Thompson Sampling, Deep Learning, Reinforcement Learning
- TL;DR: A Deep Learning adaptation of Randomized Least Squares Value Iteration
- Abstract: Exploration while learning representations is one of the main challenges Deep Reinforcement Learning (DRL) faces today. As the learned representation is dependant in the observed data, the exploration strategy has a crucial role. The popular DQN algorithm has improved significantly the capabilities of Reinforcement Learning (RL) algorithms to learn state representations from raw data, yet, it uses a naive exploration strategy which is statistically inefficient. The Randomized Least Squares Value Iteration (RLSVI) algorithm (Osband et al., 2016), on the other hand, explores and generalizes efficiently via linearly parameterized value functions. However, it is based on hand-designed state representation that requires prior engineering work for every environment. In this paper, we propose a Deep Learning adaptation for RLSVI. Rather than using hand-design state representation, we use a state representation that is being learned directly from the data by a DQN agent. As the representation is being optimized during the learning process, a key component for the suggested method is a likelihood matching mechanism, which adapts to the changing representations. We demonstrate the importance of the various properties of our algorithm on a toy problem and show that our method outperforms DQN in five Atari benchmarks, reaching competitive results with the Rainbow algorithm.