Distributional Distance Classifiers for Goal-Conditioned Reinforcement Learning

Published: 19 Jun 2023, Last Modified: 09 Jul 2023Frontiers4LCDEveryoneRevisionsBibTeX
Keywords: Reinforcement Learning, Goal-Conditioned, Dynamical Distance Learning, Contrastive Learning
TL;DR: A distributional variant of classifier learning to reconcile between two notions of planning: maximizing the probability of reaching a goal state, and minimizing the expected number of steps to reach that goal state.
Abstract: Autonomous systems are increasingly being deployed in stochastic real-world environments. Often, these agents are trying to find the shortest path to a commanded goal. But what does it mean to find the shortest path in stochastic environments, where every strategy has a non-zero probability of failing? At the core of this question is a conflict between two seemingly-natural notions of planning: maximizing the probability of reaching a goal state, and minimizing the expected number of steps to reach that goal state. Reinforcement learning (RL) methods based on minimizing the steps to a goal make an implicit assumption: that the goal is always reached, at least within some finite horizon. This assumption is violated in practical settings and can lead to very suboptimal strategies. In this paper, we bridge the gap between these two notions of planning by estimating the probability of reaching the goal at different future timesteps. This is not the same as estimating the distance to the goal -- rather, probabilities convey uncertainty in ever reaching the goal at all. Our value function will resemble that used in distribution RL, but will be used to solve (reward-free) goal-reaching tasks rather than (single) reward-maximization tasks. Taken together, we believe that our results provide a cogent framework for thinking about probabilities and distances in stochastic settings, along with a practical and effective algorithm for goal-conditioned RL.
Submission Number: 46
Loading