Learning Goal-Conditioned Value Functions with one-step Path rewards rather than Goal-Rewards

Vikas Dhiman; Shurjo Banerjee; Jeffrey M Siskind; Jason J Corso

Learning Goal-Conditioned Value Functions with one-step Path rewards rather than Goal-Rewards

Vikas Dhiman, Shurjo Banerjee, Jeffrey M Siskind, Jason J Corso

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Multi-goal reinforcement learning (MGRL) addresses tasks where the desired goal state can change for every trial. State-of-the-art algorithms model these problems such that the reward formulation depends on the goals, to associate them with high reward. This dependence introduces additional goal reward resampling steps in algorithms like Hindsight Experience Replay (HER) that reuse trials in which the agent fails to reach the goal by recomputing rewards as if reached states were psuedo-desired goals. We propose a reformulation of goal-conditioned value functions for MGRL that yields a similar algorithm, while removing the dependence of reward functions on the goal. Our formulation thus obviates the requirement of reward-recomputation that is needed by HER and its extensions. We also extend a closely related algorithm, Floyd-Warshall Reinforcement Learning, from tabular domains to deep neural networks for use as a baseline. Our results are competetive with HER while substantially improving sampling efficiency in terms of reward computation.

Keywords: Floyd-Warshall, Reinforcement learning, goal conditioned value functions, multi-goal

TL;DR: Do Goal-Conditioned Value Functions need Goal-Rewards to Learn?

13 Replies

Loading