Keywords: Inverse reinforcement learning, implicit reward models, model-based, credit assignment
TL;DR: We study how the learning dynamics and credit assignment mechanisms differ in implicit and explicit IRL and propose a new implicit model-based offline IRL algorithm to improve prior model-free approaches.
Abstract: Inverse reinforcement learning (IRL) alleviates the practical challenges of reward design by extracting reward functions from approximately rational demonstrators. Despite enjoying theoretical advantages, IRL has not received as much adoption as Behavior Cloning (BC) which does not require repeatedly solving a complex RL inner problem and is completely offline. Recently, a new class of IRL algorithms proposes an *implicit* reward function parameterization which enables directly updating the Q function without the RL inner loop or a reward model, making the algorithms more similar to BC, more memory efficient, and potentially easier to scale. In this paper, we aim to understand how implicit IRL differs from explicit IRL. We analyze their distinct learning dynamics, preference learning, and credit assignment mechanisms and suggest learning a dynamics model can overcome the dataset challenges of prior model-free approaches. We propose a new algorithm extending implicit IRL to the offline model-based setting to leverage suboptimal datasets without requiring online training. Using the D4RL MuJoCo benchmarks, we show that the proposed algorithm is competitive with explicit model-based offline IRL in matching expert performance with only a few demonstrations and enhances the performance of model-free baselines. Furthermore, our ablation experiments support the learning dynamics analysis of entangled preference learning and credit assignment mechanisms in implicit IRL and suggest a solution by prioritizing preference learning.
Submission Number: 8
Loading