Distributional Distance Classifiers for Goal-Conditioned Reinforcement Learning

Ravi Tej Akella; Benjamin Eysenbach; Jeff Schneider; Russ Salakhutdinov

Distributional Distance Classifiers for Goal-Conditioned Reinforcement Learning

Ravi Tej Akella, Benjamin Eysenbach, Jeff Schneider, Russ Salakhutdinov

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: reinforcement learning, goal-conditioned, dynamical distance learning, stochastic, shortest path

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We connect two objective for goal-conditioned RL and use this connection to build a new RL algorithm.

Abstract: What does it mean to find the shortest path in stochastic environments if every strategy has a non-zero probability of failing? At the core of this question is a conflict between two seemingly-natural notions of planning: maximizing the probability of reaching a goal state and minimizing the expected number of steps to reach that goal state. Reinforcement learning (RL) methods based on minimizing the steps to a goal make an implicit assumption: that the goal is always reached within some finite horizon. This assumption is violated in practical settings and can lead to suboptimal strategies. In this paper, we bridge the gap between these two notions of planning by estimating the probability of reaching the goal at different future timesteps. This is not the same as estimating the distance to the goal -- rather, probabilities convey uncertainty in ever reaching the goal at all. We then propose a practical RL algorithm, Distributional NCE, for estimating these probabilities. Taken together, our results provide a way of thinking about probabilities and distances in stochastic settings, along with a practical and effective algorithm for goal-conditioned RL.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5009

Loading