SR-Reward: Taking The Path More Traveled

Seyed Mahdi B. Azad; Zahra Padar; Gabriel Kalweit; Joschka Boedecker

SR-Reward: Taking The Path More Traveled

Seyed Mahdi B. Azad, Zahra Padar, Gabriel Kalweit, Joschka Boedecker

Published: 13 Jun 2025, Last Modified: 13 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In this paper, we propose a novel method for learning reward functions directly from offline demonstrations. Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner's policy, eliminating the adversarial interaction typically required between the two. This results in a more stable and efficient training process. Our reward module, \textit{SR-Reward}, leverages successor representation (SR) to encode a state based on expected future states' visitation under the demonstration policy and transition dynamics. By utilizing the Bellman equation, SR-Reward can be learned concurrently with most reinforcement learning (RL) algorithms without altering the existing training pipeline. We also introduce a negative sampling strategy to mitigate overestimation errors by reducing rewards for out-of-distribution data, thereby enhancing robustness. This strategy introduces an inherent conservative bias into RL algorithms that employ the learned reward, encouraging them to stay close to the demonstrations where the consequences of the actions are better understood. We evaluate our method on D4RL as well as Maniskill Robot Manipulation environments, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=LxXOqhPYEw&

Changes Since Last Submission: - Camera-ready version of the paper, which includes links to the code and the OpenReview discussions. - Link to video - Link to code

Video: https://www.youtube.com/watch?v=vYLgpXd5Lew

Code: https://github.com/Erfi/SR-Reward

Assigned Action Editor: ~Nino_Vieillard1

Submission Number: 4139

Loading