Learning Goal-Conditioned Value Functions with one-step Path rewards rather than Goal-RewardsDownload PDF

27 Sep 2018 (modified: 21 Dec 2018)ICLR 2019 Conference Blind SubmissionReaders: Everyone
  • Abstract: Multi-goal reinforcement learning (MGRL) addresses tasks where the desired goal state can change for every trial. State-of-the-art algorithms model these problems such that the reward formulation depends on the goals, to associate them with high reward. This dependence introduces additional goal reward resampling steps in algorithms like Hindsight Experience Replay (HER) that reuse trials in which the agent fails to reach the goal by recomputing rewards as if reached states were psuedo-desired goals. We propose a reformulation of goal-conditioned value functions for MGRL that yields a similar algorithm, while removing the dependence of reward functions on the goal. Our formulation thus obviates the requirement of reward-recomputation that is needed by HER and its extensions. We also extend a closely related algorithm, Floyd-Warshall Reinforcement Learning, from tabular domains to deep neural networks for use as a baseline. Our results are competetive with HER while substantially improving sampling efficiency in terms of reward computation.
  • Keywords: Floyd-Warshall, Reinforcement learning, goal conditioned value functions, multi-goal
  • TL;DR: Do Goal-Conditioned Value Functions need Goal-Rewards to Learn?
13 Replies