Keywords: imitation learning, adversarial imitation learning, sample efficiency
Abstract: Adversarial imitation learning (AIL) achieves superior expert sample efficiency compared to behavioral cloning (BC) but requires extensive online environment interactions. Recent empirical works have attempted to mitigate this limitation by augmenting AIL with BC---for instance, initializing AIL algorithms with BC-pretrained policies. Despite certain empirical successes, systematic theoretical analysis of the provable efficiency gains remains lacking. This paper provides rigorous theoretical guarantees and develops effective algorithms to accelerate AIL. First, we develop a theoretical analysis for AIL with policy pretraining alone, revealing a critical but theoretically unexplored limitation: the absence of reward pretraining. Building on this insight, we derive a principled reward pretraining method grounded in reward-shaping-based analysis. Crucially, our analysis reveals a fundamental connection between the expert policy and shaping reward, naturally giving rise to CoPT-AIL, an approach that jointly pretrains policies and rewards through a single BC procedure. Theoretical results demonstrate that CoPT-AIL achieves an improved imitation gap bound compared to standard AIL without pretraining, providing the first theoretical guarantee for the benefits of pretraining in AIL. Experimental evaluation confirms CoPT-AIL's superior performance over prior AIL methods.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 17861
Loading