Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Keywords: Offline Goal-Conditioned Reinforcement Learning, Reward Shaping
TL;DR: We propose an novel and effective reward-shaping method for credit assignment based on generative modeling of the occupancy measure and optimal transport, demonstrating state-of-the-art performance in offline GCRL.
Abstract: While offline goal-conditioned reinforcement learning (GCRL) provides a simple recipe to train generalist policies from large unlabeled datasets, Offline GCRL agents trained with sparse rewards typically struggle on long-horizon tasks. Manually designing task-specific reward functions undermines the simplicity, scalability and generality of this paradigm. Moreover, prior approaches to learn rewards for effective credit assignment fail to adequately capture goal-reaching information as tasks scale in complexity. To address this gap, we propose $\textrm{\textbf{Occupancy Reward Shaping(ORS)}}$, a novel reward-shaping approach that leverages a learned occupancy measure; a distribution that naturally captures complex long-horizon temporal dependencies between states; and distills goal-reaching information from the occupancy measure into a general-purpose reward function for effective credit assignment. We demonstrate that ORS achieves a $\mathbf{2.3\times}$ improvement in performance on average over its base RL algorithm across a diverse set of long-horizon locomotion and manipulation tasks and outperforms prior state-of-the-art methods.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 21403
Loading