CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulationDownload PDF

Published: 30 Aug 2023, Last Modified: 30 Aug 2023CoRL 2023 PosterReaders: Everyone
Keywords: Contact-rich Manipulation, Visual-language guided policies
Abstract: Making contact with purpose is a central part of robot manipulation and remains essential for many household tasks -- from sweeping dust into a dustpan, to wiping tables; from erasing whiteboards, to applying paint. In this work, we investigate learning language-conditioned, vision-based manipulation policies wherein the action representation is in fact, \textit{contact itself} -- predicting contact formations at which tools grasped by the robot should meet an observable surface. Our approach, Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation (CALAMARI), exhibits several advantages including (i) benefiting from existing visual-language models for pretrained spatial features, grounding instructions to behaviors, and for sim2real transfer; and (ii) factorizing perception and control over a natural boundary (\ie contact) into two modules that synergize with each other, whereby action predictions can be aligned per pixel with image observations, and low-level controllers can optimize motion trajectories that maintain contact while avoiding penetration. Experiments show that CALAMARI outperforms existing state-of-the-art model architectures for a broad range of contact-rich tasks, and pushes new ground on embodiment-agnostic generalization to unseen objects with varying elasticity, geometry, and colors in both simulated and real-world settings.
Student First Author: yes
Supplementary Material: zip
Instructions: I have read the instructions for authors (
0 Replies