Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning

Anthony Kobanda; Waris Radji; Mathieu Petitbois; Odalric-Ambrym Maillard; Rémy Portelas

Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning

Anthony Kobanda, Waris Radji, Mathieu Petitbois, Odalric-Ambrym Maillard, Rémy Portelas

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, deep learning, goal-condition reinforcement learning, long horizon, navigation, quasimetrics

TL;DR: reinforcement learning, deep learning, goal-condition reinforcement learning, long horizon, navigation, quasimetrics

Abstract: Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals using previously collected reward-free data. Scaling that promise to long-horizon tasks with complex dynamics remains challenging, notably due to compounding value‑estimation errors. Principled geometric learning offers a potential solution to address these issues. Following this insight in our research, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns a differentiable asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over learned latent space, secondly as a structured directional cost guiding towards proximal sub-goals. ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the keypoints to stay within reachable areas. By unifying metric learning, keypoint coverage, and goal‑conditioned control, our approach produces meaningful sub‑goals and robustly drives long‑horizon goal‑reaching on diverse navigation and manipulation benchmarks.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Anthony_Kobanda1

Track: Regular Track: unpublished work

Submission Number: 144

Loading