Keywords: Thompson Sampling, Deep Learning, Reinforcement Learning
TL;DR: A Deep Learning adaptation of Randomized Least Squares Value Iteration
Abstract: Exploration while learning representations is one of the main challenges Deep
Reinforcement Learning (DRL) faces today. As the learned representation is dependant in the observed data, the exploration strategy has a crucial role. The popular DQN algorithm has improved significantly the capabilities of Reinforcement
Learning (RL) algorithms to learn state representations from raw data, yet, it uses
a naive exploration strategy which is statistically inefficient. The Randomized
Least Squares Value Iteration (RLSVI) algorithm (Osband et al., 2016), on the
other hand, explores and generalizes efficiently via linearly parameterized value
functions. However, it is based on hand-designed state representation that requires
prior engineering work for every environment. In this paper, we propose a Deep
Learning adaptation for RLSVI. Rather than using hand-design state representation, we use a state representation that is being learned directly from the data by a
DQN agent. As the representation is being optimized during the learning process,
a key component for the suggested method is a likelihood matching mechanism,
which adapts to the changing representations. We demonstrate the importance of
the various properties of our algorithm on a toy problem and show that our method
outperforms DQN in five Atari benchmarks, reaching competitive results with the
Rainbow algorithm.
Original Pdf: pdf
10 Replies
Loading