Keywords: Goal-conditioned reinforcement learning, metric learning
TL;DR: We show how to learn representations of temporal distances that exploit quasimetric architectures in offline GCRL.
Abstract: Approaches for goal-conditioned reinforcement learning (GCRL) often use
learned state representations to extract goal-reaching policies. Two
frameworks for representation structure have yielded particularly
effective GCRL algorithms: (1) *contrastive representations*, in which
methods learn "successor features" with a contrastive objective that
performs inference over future outcomes, and (2) *temporal distances*,
which link the (quasimetric) distance in representation space to the
transit time from states to goals. We propose an approach that unifies
these two frameworks, using the structure of a quasimetric
representation space (triangle inequality) with the right additional
constraints to learn successor representations that enable optimal
goal-reaching. Unlike past work, our approach is able to exploit a
**quasimetric** distance parameterization to learn **optimal**
goal-reaching distances, even with **suboptimal** data and in
**stochastic** environments. This gives us the best of both worlds: we
retain the stability and long-horizon capabilities of Monte Carlo
contrastive RL methods, while getting the free stitching capabilities of
quasimetric network parameterizations. On existing offline GCRL
benchmarks, our representation learning objective improves performance
on stitching tasks where methods based on contrastive learning struggle,
and on noisy, high-dimensional environments where methods based on
quasimetric networks struggle.
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 20836
Loading