Deep Implicit Imitation Reinforcement Learning in Heterogeneous Action Settings

Published: 2025, Last Modified: 12 May 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Implicit imitation reinforcement learning (IIRL) is a framework that aims to aid a trainee agent’s learning process via observing the state transitions of a mentor, but without access to the latter's action information. Standard IIRL assumes a shared Markov decision process (MDP) between the mentor and trainee, consequently implying an identical action space. This restriction imposes limitations on the applicability of implicit imitation frameworks in real-life scenarios where, possibly due to variations in physical characteristics, the mentor agent may possess distinct own actions, thereby creating a heterogeneous action setting. In this work, we extend the deep implicit imitation Q-networks (DIIQN) method -an online, model-free, deep RL algorithm for implicit imitation- to allow for heterogeneous action sets between mentor and trainee agents. Equipped with our heterogeneous actions DIIQN (HA-DIIQN) method, a trainee agent can harvest the benefits of IIRL even in heterogeneous action settings, achieving accelerated learning and outperforming non-optimal mentor agents.
Loading