ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations

Oliver Struckmeier; Ville Kyrki

ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations

Oliver Struckmeier, Ville Kyrki

Published: 02 Nov 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Imitation learning from observations (IfO) constrains the classic imitation learning setting to cases where expert observations are easy to obtain, but no expert actions are available. Most existing IfO methods require access to task-specific cost functions or many interactions with the target environment. Learning a forward dynamics model in combination with a latent policy has been shown to solve these issues. However, the limited supervision in the IfO scenario can lead to mode collapse when learning the generative forward dynamics model and the corresponding latent policy. In this paper, we analyze the mode collapse problem in this setting and show that it is caused by a combination of deterministic expert data and bad initialization of the models. Under the assumption of piecewise continuous system dynamics, we propose ILPO-MP, a method to prevent the mode collapse using clustering of expert transitions to impose a mode prior on the generative model and the latent policy. We show that ILPO-MP prevents mode collapse and improves performance in a variety of environments.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Changelog - Camera ready version uploaded Changelog - Version 4 - The caption of Figure 3 was not complete. Changelog - Version 3 - added discusison on [1] in 2.2 - highlighting the novelty of learning forward dynamics instead of backwards in 2.2 and 3. Changelog - Version 2 - More clear definition of the assumption of piecewise continuity in 4.2 - Discussion of why the work extends to more complex datasets (new section 6.1) - Added motivation for the selection of mode priors to solve the mode collapse problem in 4.2 - Fixed notation of $\Delta t$ - Clarified the use of the term transition and its definition - Improved explanations and figure captions in section 4.2. and figure 4 - Added citations and discussions for procrustes and agglomerative clustering in 4.2 - Added references for $z^l$ - Added more details on Guided-ILPO in 4.3 - Clarified the discussion of noise in human actions in 4.1, added citation to Mandlekar et al., 2021

Assigned Action Editor: ~Mathieu_Salzmann1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1214

Loading