Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

ICLR 2026 Conference Submission21403 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Goal-Conditioned Reinforcement Learning, Reward Shaping

TL;DR: We propose an novel and effective reward-shaping method for credit assignment based on generative modeling of the occupancy measure and optimal transport, demonstrating state-of-the-art performance in offline GCRL.

Abstract: While offline goal-conditioned reinforcement learning (GCRL) provides a simple recipe to train generalist policies from large unlabeled datasets, Offline GCRL agents trained with sparse rewards typically struggle on long-horizon tasks. Manually designing task-specific reward functions undermines the simplicity, scalability and generality of this paradigm. Moreover, prior approaches to learn rewards for effective credit assignment fail to adequately capture goal-reaching information as tasks scale in complexity. To address this gap, we propose $\textrm{\textbf{Occupancy Reward Shaping(ORS)}}$, a novel reward-shaping approach that leverages a learned occupancy measure; a distribution that naturally captures complex long-horizon temporal dependencies between states; and distills goal-reaching information from the occupancy measure into a general-purpose reward function for effective credit assignment. We demonstrate that ORS achieves a $\mathbf{2.3\times}$ improvement in performance on average over its base RL algorithm across a diverse set of long-horizon locomotion and manipulation tasks and outperforms prior state-of-the-art methods.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 21403

Loading