Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

Haoyuan Cai; Zhenghao Peng; Bolei Zhou

Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

Haoyuan Cai, Zhenghao Peng, Bolei Zhou

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Interactive Imitation Learning (IIL) allows agents to acquire desired behaviors through human interventions, but current methods impose high cognitive demands on human supervisors. We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations. AIM utilizes a proxy Q-function to mimic the human intervention rule and adjusts intervention requests based on the alignment between agent and human actions. By assigning high Q-values when the agent deviates from the expert and decreasing these values as the agent becomes proficient, the proxy Q-function enables the agent to assess the real-time alignment with the expert and request assistance when needed. Our expert-in-the-loop experiments reveal that AIM significantly reduces expert monitoring efforts in both continuous and discrete control tasks. Compared to the uncertainty-based baseline Thrifty-DAgger, our method achieves a 40% improvement in terms of human take-over cost and learning efficiency. Furthermore, AIM effectively identifies safety-critical states for expert assistance, thereby collecting higher-quality expert demonstrations and reducing overall expert data and environment interactions needed. Code and demo video are available at https://github.com/metadriverse/AIM.

Lay Summary: Human-in-the-loop imitation learning often demands constant expert supervision, leading to wasted effort and human fatigue in training autonomous agents. We introduce AIM, which trains a lightweight intervention detector that monitors the agent’s own confidence and safety margins. It requests expert demonstrations only when the agent is likely to make mistakes or act unsafely. As training continues, AIM automatically reduces interruptions, since the detector grows more accurate and the agent’s competence increases. In simulation benchmarks for driving and navigation, AIM matches or exceeds the performance of existing methods while cutting demonstration requests. By concentrating expert attention on critical moments, AIM accelerates learning, lowers human workload, and paves the way for scalable, trustworthy human-AI collaboration.

Link To Code: https://github.com/metadriverse/AIM

Primary Area: Reinforcement Learning

Keywords: Imitation Learning, Human-in-the-loop Reinforcement Learning, Shared Autonomy

Submission Number: 15110

Loading