Keywords: Inverse Reinforcement Learning, Transfer Learning, Abstraction
TL;DR: We propose TraIRL, a method that learns transferable rewards by mapping ground states into a shared abstract state space via VAE. This enables learning an abstract reward function that generalizes to unseen target task.
Abstract: Inverse reinforcement learning (IRL) has made significant progress in recovering reward functions from expert demonstrations. However, a key challenge remains: how to extract reward functions that generalize across related but distinct task instances. In this paper, we address this by focusing on transferable IRL—learning intrinsic rewards that can drive effective behavior in unseen but structurally aligned environments. Our method leverages a variational autoencoder (VAE) to learn an abstract representation of the state space shared across multiple source task instances. This abstracted space captures high-level features that are invariant across tasks, enabling the learning of a unified abstract reward function. The learned reward is then used to train policies in a separate, previously unseen target instance without requiring new demonstrations in the target instance. We evaluate our approach on multiple environments from Gymnasium and AssistiveGym, demonstrating that the learned abstract rewards consistently support successful policy learning in novel task settings.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 13836
Loading