Adversarial Imitation Learning from Visual Observations using Latent Information

Published: 23 May 2024, Last Modified: 23 May 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: We focus on the problem of imitation learning from visual observations, where the learning agent has access to videos of experts as its sole learning source. The challenges of this framework include the absence of expert actions and the partial observability of the environment, as the ground-truth states can only be inferred from pixels. To tackle this problem, we first conduct a theoretical analysis of imitation learning in partially observable environments. We establish upper bounds on the suboptimality of the learning agent with respect to the divergence between the expert and the agent latent state-transition distributions. Motivated by this analysis, we introduce an algorithm called Latent Adversarial Imitation from Observations, which combines off-policy adversarial imitation techniques with a learned latent representation of the agent's state from sequences of observations. In experiments on high-dimensional continuous robotic tasks, we show that our model-free approach in latent space matches state-of-the-art performance. Additionally, we show how our method can be used to improve the efficiency of reinforcement learning from pixels by leveraging expert videos. To ensure reproducibility, we provide free access to all the learning curves and open-source our code.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. We have further clarified the relation between the spaces $\mathcal{Z}$ and $\mathcal{S}$ and added the requested information about the policy $\pi$ in the “Latent representation in POMDP” paragraph in Section 3 (Preliminaries). 2. We removed that our algorithm has better runtime efficiency than existing algorithms as main contribution of the paper. 3. We have moved Figures showing return as a function of training steps from the Appendix to the main body of the paper. 4. We have reported the number of expert demonstrations in all the captions of the tables. 5. We have further clarified the normalization step in Fig.3 in its caption. 6. We have added "Adversarial Imitation Learning from Video Using a State Observer, Karnan et al" and "SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards, Reddy et al" as references in the Related Work Section.
Supplementary Material: pdf
Assigned Action Editor: ~Florian_Shkurti1
Submission Number: 2062