Chain-of-Thought Predictive Control

Zhiwei Jia; Fangchen Liu; Vineet Thumuluri; Linghao Chen; Zhiao Huang; Hao Su

Chain-of-Thought Predictive Control

Zhiwei Jia, Fangchen Liu, Vineet Thumuluri, Linghao Chen, Zhiao Huang, Hao Su

Published: 03 Mar 2023, Last Modified: 18 May 2025RRL 2023 PosterReaders: Everyone

Keywords: Behavior Cloning, Hierarchical Imitation Learning, Generalizable Policy Learning, Object Manipulation

TL;DR: We propose a powerful imitation learning method that reformulates the hierarchical principles to solve challenging contact-rich control tasks.

Abstract: We study generalizable policy learning from demonstrations for complex low-level control tasks (e.g., contact-rich object manipulations). We propose an imitation learning method that incorporates the idea of temporal abstraction and the planning capabilities from Hierarchical RL (HRL) in a novel and effective manner. As a step towards decision foundation models, our design can utilize scalable, albeit highly sub-optimal, demonstrations. Specifically, we find certain short subsequences of the demos, i.e. the chain-of-thought (CoT), reflect their hierarchical structures by marking the completion of subgoals in the tasks. Our model learns to dynamically predict the entire CoT as coherent and structured long-term action guidance and consistently outperforms typical two-stage subgoal-conditioned policies. On the other hand, such CoT facilitates generalizable policy learning as they exemplify the decision patterns shared among demos (even those with heavy noises and randomness). Our method, Chain-of-Thought Predictive Control (CoTPC), significantly outperforms existing ones on challenging low-level manipulation tasks from scalable yet highly sub-optimal demos.

Track: Technical Paper

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/chain-of-thought-predictive-control/code)

2 Replies

Loading