One-Shot Imitation with Skill Chaining using a Goal-Conditioned Policy in Long-Horizon Control

Hayato Watahiki; Yoshimasa Tsuruoka

One-Shot Imitation with Skill Chaining using a Goal-Conditioned Policy in Long-Horizon Control

Hayato Watahiki, Yoshimasa Tsuruoka

Published: 27 Apr 2022, Last Modified: 05 May 2023ICLR 2022 GPL PosterReaders: Everyone

Keywords: imitation learning, one-shot learning, offline dataset

TL;DR: We introduced a one-shot imitation algorithm that can connect subtask-local skills using a goal-conditioned policy in long-horizon tasks.

Abstract: Recent advances in skill learning from a task-agnostic offline dataset enable the agent to acquire various skills that can be used as primitives to perform long-horizon imitation. However, most work implicitly assumes that the offline dataset covers the entire distribution of target demonstrations. If the dataset only contains subtask-local trajectories, existing methods fail to imitate the transitions between subtasks without a sufficient amount of target demonstrations, significantly limiting the scalability of these methods. In this work, we show that a simple goal-conditioned policy can imitate the missing transitions using only the target demonstrations. We combine it with a policy-switching strategy that uses the skills when they are applicable. Furthermore, we present multiple choices that can effectively evaluate the applicability of skills. Our new method successfully performs one-shot imitation with skills learned from a subtask-local offline dataset. We experimentally show that it outperforms other one-shot imitation methods in a challenging kitchen environment, and we also qualitatively analyze how each policy-switching strategy works during imitation.

1 Reply

Loading