Inversely Learning Transferable Rewards via Abstracted States

Inversely Learning Transferable Rewards via Abstracted States

ICLR 2026 Conference Submission13836 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Inverse Reinforcement Learning, Transfer Learning, Abstraction

TL;DR: We propose TraIRL, a method that learns transferable rewards by mapping ground states into a shared abstract state space via VAE. This enables learning an abstract reward function that generalizes to unseen target task.

Abstract: Inverse reinforcement learning (IRL) has made significant progress in recovering reward functions from expert demonstrations. However, a key challenge remains: how to extract reward functions that generalize across related but distinct task instances. In this paper, we address this by focusing on transferable IRL—learning intrinsic rewards that can drive effective behavior in unseen but structurally aligned environments. Our method leverages a variational autoencoder (VAE) to learn an abstract representation of the state space shared across multiple source task instances. This abstracted space captures high-level features that are invariant across tasks, enabling the learning of a unified abstract reward function. The learned reward is then used to train policies in a separate, previously unseen target instance without requiring new demonstrations in the target instance. We evaluate our approach on multiple environments from Gymnasium and AssistiveGym, demonstrating that the learned abstract rewards consistently support successful policy learning in novel task settings.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 13836

Loading