DITTO: Offline Imitation Learning with World ModelsDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: world models, imitation learning, reinforcement learning
TL;DR: Completely offline imitation learning with world models, using RL on a latent matching objective in the model.
Abstract: We propose DITTO, a fully offline approach to imitation learning which addresses the problem of covariate shift without access to an oracle or any additional online interactions. By unrolling agent policies in the latent space of a learned world model and penalizing drift from expert demonstrations, we can use online reinforcement learning algorithms to learn policies which solve the imitation objective, without access to the underlying environment or reward function. Decoupling policy and world model learning lets us leverage datasets of any quality to learn latent representations which provide a natural reward signal for imitation learning, avoiding the need for complex adversarial or sparse imitation-inducing rewards. Compared to competitive baselines, our method achieves state-of-the-art performance in a variety of challenging environments from pixel observations alone.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
Supplementary Material: zip
10 Replies

Loading