Provably Efficient Policy-Reward Co-Pretraining for Adversarial Imitation Learning

Provably Efficient Policy-Reward Co-Pretraining for Adversarial Imitation Learning

ICLR 2026 Conference Submission17861 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: imitation learning, adversarial imitation learning, sample efficiency

Abstract: Adversarial imitation learning (AIL) achieves superior expert sample efficiency compared to behavioral cloning (BC) but requires extensive online environment interactions. Recent empirical works have attempted to mitigate this limitation by augmenting AIL with BC---for instance, initializing AIL algorithms with BC-pretrained policies. Despite certain empirical successes, systematic theoretical analysis of the provable efficiency gains remains lacking. This paper provides rigorous theoretical guarantees and develops effective algorithms to accelerate AIL. First, we develop a theoretical analysis for AIL with policy pretraining alone, revealing a critical but theoretically unexplored limitation: the absence of reward pretraining. Building on this insight, we derive a principled reward pretraining method grounded in reward-shaping-based analysis. Crucially, our analysis reveals a fundamental connection between the expert policy and shaping reward, naturally giving rise to CoPT-AIL, an approach that jointly pretrains policies and rewards through a single BC procedure. Theoretical results demonstrate that CoPT-AIL achieves an improved imitation gap bound compared to standard AIL without pretraining, providing the first theoretical guarantee for the benefits of pretraining in AIL. Experimental evaluation confirms CoPT-AIL's superior performance over prior AIL methods.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 17861

Loading