Guided Imitation of Task and Motion Planning

Michael James McDonald; Dylan Hadfield-Menell

Guided Imitation of Task and Motion Planning

Michael James McDonald, Dylan Hadfield-Menell

Published: 13 Sept 2021, Last Modified: 27 Apr 2025CoRL2021 OralReaders: Everyone

Keywords: task and motion planning, mobile manipulation, imitation learning

Abstract: While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals. On the other hand, task and motion planning (TAMP) methods scale to long horizons but they are computationally expensive and need to precisely track world state. We propose a method that draws on the strength of both methods: we train a policy to imitate a TAMP solver's output. This produces a feed-forward policy that can accomplish multi-step tasks from sensory data. First, we build an asynchronous distributed TAMP solver that can produce supervision data fast enough for imitation learning. Then, we propose a hierarchical policy architecture that lets us use partially trained control policies to speed up the TAMP solver. In robotic manipulation tasks with 7-DoF joint control, the partially trained policies reduce the time needed for planning by a factor of up to 2.6. Among these tasks, we can learn a policy that solves the RoboSuite 4-object pick-place task 88% of the time from object pose observations and a policy that solves the RoboDesk 9-goal benchmark 79% of the time from RGB images (averaged across the 9 disparate tasks).

Supplementary Material: zip

Poster: png

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/guided-imitation-of-task-and-motion-planning/code)

10 Replies

Loading