Keywords: Human-in-the-loop Imitation Learning, Dynamic Regret, Online Learning, Off-policy Imitation Learning
TL;DR: We propose  a human-in-the-loop learning method that achieves faithful imitation via distribution alignment and adapts to evolving behavior using dynamic regret minimization.
Abstract: Human-in-the-loop (HIL) imitation learning enables agents to learn complex behaviors safely through real-time human intervention. However, existing methods struggle to efficiently leverage agent-generated data due to dynamically evolving trajectory distributions and imperfections caused by human intervention delays, often failing to faithfully imitate the human expert policy. In this work, we propose Faithful Dynamic Imitation Learning (FaithDaIL) to address these challenges. We formulate HIL imitation learning as an online non-convex problem and employ dynamic regret minimization to adapt to the shifting data distribution and track high-quality policy trajectories. 
To ensure faithful imitation of the human expert despite training on mixed agent and human data, we introduce an unbiased imitation objective and achieve it by weighting the behavior distribution relative to the human expert's as a proxy reward.
Extensive experiments on MetaDrive and CARLA driving benchmarks demonstrate that FaithDaIL achieves state-of-the-art performance in safety and task success with significantly reduced human intervention data compared to prior HIL baselines.
Supplementary Material:  zip
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 15339
Loading