Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations

Ville Tanskanen; Chang Rajani; Perttu Hämäläinen; Christian Guckelsberger; Arto Klami

Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations

Ville Tanskanen, Chang Rajani, Perttu Hämäläinen, Christian Guckelsberger, Arto Klami

Published: 10 Jul 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Inverse reinforcement learning (IRL) methods learn a reward function from expert demonstrations such as human behavior, offering a practical solution for crafting reward functions for complex environments. However, IRL is computationally expensive when applied to large populations of demonstrators, as existing IRL algorithms require solving a separate reinforcement learning (RL) problem for each individual. We propose a new IRL approach that relies on contextual RL, where an optimal policy is learned for multiple contexts. We first learn a contextual policy that provides the RL solution directly for a parametric family of reward functions, and then re-use it for IRL on each individual within the population. We motivate our method within the scenario of AI-driven playtesting of videogames, and focus on an interpretable family of reward functions. We evaluate the method on a navigation task and the battle arena game Derk, where it successfully recovers distinct player reward preferences from a simulated population and provides substantial time savings compared to a solid baseline of adversarial IRL.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Modifications suggested by accept with minor revision. Deanonymized. Updated code (warmstart experiment was missing).

Supplementary Material: zip

Assigned Action Editor: ~Marcello_Restelli1

Submission Number: 1973

Loading