- Keywords: deep reinforcement learning, quantile regression, vector reward
- TL;DR: We train with state aligned vector rewards an agent predicting state changes from action distributions, using a new reinforcement learning technique inspired by quantile regression.
- Abstract: The sample efficiency of deep reinforcement learning (DRL) algorithms is limited by the weak scalar training signal. We propose to use state aligned vector rewards to capitalize on the spatiotemporal nature of reaching problems and show that a state change distribution can be learned given an action distribution. Our agent, trained with a new DRL method inspired by quantile regression, is able to learn multiple times faster in high dimensional state spaces than a classical DRL algorithm.