Chain-of-Thought Predictive Control

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Hierarchical Imitation Learning, Robotic Manipulation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: An imitation learning algorithm that solves hard low-level control tasks by adopting subgoal predictions learned from the unsupervised discovery of subgoals in the demonstrations.
Abstract: We study generalizable policy learning from demonstrations for complex low-level control tasks (e.g., contact-rich object manipulations). We propose a novel hierarchical imitation learning method that utilizes scalable, albeit sub-optimal, demonstrations. Firstly, we propose an observation space-agnostic approach that efficiently discovers the multi-step subgoal decomposition (sequences of key observations) of the demos in an unsupervised manner. By grouping temporarily close and functionally similar actions into subskill-level segments, the discovered breakpoints (the segment boundaries) constitute a chain of planning steps (i.e., the chain-of-thought) to complete the task. Next, we propose a Transformer-based design that effectively learns to predict the chain-of-thought (CoT) as the high-level guidance for low-level action. We couple action and CoT predictions via prompt tokens and a hybrid masking strategy, which enable dynamically updated CoT guidance at test time and improve feature representation of the trajectory for generalizable policy learning. Our method, named Chain-of-Thought Predictive Control (CoTPC), consistently surpasses existing strong baselines on a wide range of challenging low-level manipulation tasks with scalable yet sub-optimal demos.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9232
Loading