Inverse Reinforcement Learning of Interactive Scenarios

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inverse Reinforcement Learning
Abstract: This paper studies the problem where a learner aims to learn the reward function of an expert and a policy to interact with the expert from interactions with the expert. We formulate the problem as a stochastic bi-level optimization problem where the lower level learns a reward function that explains the behaviors of the expert, and the upper level learns a policy to interact with the expert. We develop a double-loop algorithm, General Scenario Interactive Inverse Reinforcement Learning (GSIIRL), which solves the lower-level optimization problem in the inner loop and the upper-level optimization problem in the outer loop. We formally guarantee that GSIIRL converges at the rate of $\mathcal{O}(\frac{1}{\sqrt{K}})$ and empirically validate our algorithm through simulations.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 15664
Loading