Quantile Regression Reinforcement Learning with State Aligned Vector RewardsDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Learning from a scalar reward in continuous action space environments is difficult and often requires millions if not billions of interactions. We introduce state aligned vector rewards, which are easily defined in metric state spaces and allow our deep reinforcement learning agent to tackle the curse of dimensionality. Our agent learns to map from action distributions to state change distributions implicitly defined in a quantile function neural network. We further introduce a new reinforcement learning technique inspired by quantile regression which does not limit agents to explicitly parameterized action distributions. Our results in high dimensional state spaces show that training with vector rewards allows our agent to learn multiple times faster than an agent training with scalar rewards.
Keywords: deep reinforcement learning, quantile regression, vector reward
TL;DR: We train with state aligned vector rewards an agent predicting state changes from action distributions, using a new reinforcement learning technique inspired by quantile regression.
12 Replies

Loading