Keywords: Reinforcement Learning, Stochastic Planning, Delayed Feedback
TL;DR: A novel method to mitigate performance degradation issue in environments with delayed feedback.
Abstract: In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.
Primary Keywords: Applications, Learning
Category: Long
Student: Graduate
Supplemtary Material: zip
Submission Number: 175
Loading