Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement

Michael Bloesch; Markus Wulfmeier; Philemon Brakel; Todor Davchev; Martina Zambelli; Jost Tobias Springenberg; Abbas Abdolmaleki; William F Whitney; Nicolas Heess; Roland Hafner; Martin Riedmiller

Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement

Michael Bloesch, Markus Wulfmeier, Philemon Brakel, Todor Davchev, Martina Zambelli, Jost Tobias Springenberg, Abbas Abdolmaleki, William F Whitney, Nicolas Heess, Roland Hafner, Martin Riedmiller

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imitation learning from observation, self-improvement

Abstract: Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors from large-scale, mixed-quality data. Unlike behavior cloning or offline reinforcement learning, IfO leverages action-free demonstrations and circumvents the need for costly action-labeled demonstrations or carefully crafted reward functions. However, current research focuses on idealized scenarios with tailored data distributions. This paper introduces a novel algorithm to learn from datasets with varying quality, moving closer to a paradigm in which imitation learning can be performed iteratively via self-improvement. Our method extends RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data. Through comprehensive evaluation, we delineate the relation between different data distributions and the applicability of algorithms and highlight the limitations of established methods. Our findings provide valuable insights for developing more robust and practical IfO techniques on a path to scalable behaviour learning.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12502

Loading