C3DM: Constrained-Context Conditional Diffusion Models for Imitation Learning

Published: 15 Aug 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Behavior Cloning (BC) methods are effective at learning complex manipulation tasks. However, they are prone to spurious correlation - expressive models may focus on distractors that are irrelevant to action prediction - and are thus fragile in real-world deployment. Prior methods have addressed this challenge by exploring different model architectures and action representations. However, none were able to balance between sample efficiency and robustness against distractors for solving manipulation tasks with a complex action space. We present Constrained-Context Conditional Diffusion Model (C3DM), a diffusion model policy for solving 6-DoF robotic manipulation tasks with robustness to distractions that can learn deployable robot policies from as little as five demonstrations. A key component of C3DM is a \emph{fixation} step that helps the action denoiser to focus on task-relevant regions around a predicted \emph{fixation point} while ignoring distractors in the context. We empirically show that C3DM is robust to out-of-distribution distractors, and consistently achieves high success rates on a wide array of tasks, ranging from table-top manipulation to industrial kitting that require varying levels of precision and robustness to distractors.
Submission Length: Regular submission (no more than 12 pages of main content)
Supplementary Material: zip
Assigned Action Editor: ~Matthew_Walter1
Submission Number: 2430
Loading