MR-CRL: Leveraging Predictive Representations for Contrastive Goal-Conditioned Reinforcement Learning

Muhammad Qasim Ali; Winnie Trandinh; Howard Nguyen-Huu; Alexander Wong

MR-CRL: Leveraging Predictive Representations for Contrastive Goal-Conditioned Reinforcement Learning

Muhammad Qasim Ali, Winnie Trandinh, Howard Nguyen-Huu, Alexander Wong

Published: 01 Jul 2025, Last Modified: 01 Jul 2025RLBrew: Ingredients for Developing Generalist Agents workshop (RLC 2025)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Contrastive reinforcement learning, model-based reinforcement learning, representation learning, self-supervised learning, contrastive learning, goal-conditioned reinforcement learning

TL;DR: We introduce MR-CRL, a contrastive reinforcement learning framework enhanced with model-based predictive representations trained via cross-entropy loss, improving performance in goal-conditioned offline RL on some tasks.

Abstract: Goal-conditioned reinforcement learning (GCRL) aims to train agents capable of achieving arbitrary goals, a task made significantly harder in offline settings where rewards and environment interaction are unavailable. Contrastive Reinforcement Learning (CRL) is a goal-conditioned framework that learns value functions through contrastive objectives, enabling effective policy learning from offline datasets without reward labels or environment interaction. In parallel, model-based reinforcement learning (MBRL) has shown that learning predictive representations of environment dynamics can significantly improve policy performance and sample efficiency. While both approaches learn features that anticipate future states, their integration remains underexplored. In this work, we investigate whether model-based predictive representations can enhance CRL’s similarity-based value estimation. We propose Model-based Representations for Contrastive Reinforcement Learning (MR-CRL), a simple extension that augments CRL with predictive state and dynamics encoders trained using a novel cross-entropy loss objective over latent dynamics predictions. We evaluate multiple integration strategies within the CRL architecture and find that MR-CRL outperforms the original CRL baseline on 4 out of 18 tasks in the OGBench benchmark, with significant gains in both low- and high-dimensional environments. While gains are not universal, our results suggest that model-based inductive biases can enhance training goal-reaching on some tasks.

Submission Number: 24

Loading