Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning

Published: 20 Jun 2025, Last Modified: 22 Jul 2025RLVG Workshop - RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, deep learning, goal-condition reinforcement learning, long horizon, navigation, quasimetrics
TL;DR: ProQ learns a quasimetric + OOD filter that turns offline trajectories into a few in-distribution key-points. Floyd-Warshall on this tiny graph yields sub-goals ; an AWR controller follows them.
Abstract: Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals using previously collected reward-free data. Scaling that promise to long-horizon tasks with complex dynamics remains challenging, notably due to compounding value‑estimation errors. Principled geometric learning offers a potential solution to address these issues. Following this insight in our research, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns a differentiable asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over learned latent space, secondly as a structured directional cost guiding towards proximal sub-goals. ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the keypoints to stay within reachable areas. By unifying metric learning, keypoint coverage, and goal‑conditioned control, our approach produces meaningful sub‑goals and robustly drives long‑horizon goal‑reaching on diverse navigation and manipulation benchmarks.
Submission Number: 17
Loading