InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation

Andrew Choong-Won Lee; Ian Chuang; Ling-Yuan Chen; Iman Soltani

InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation

Andrew Choong-Won Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani

Published: 05 Sept 2024, Last Modified: 22 Oct 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robotics, Imitation Learning, Bimanual Manipulation

TL;DR: We presented InterACT, a framework for robust bimanual manipulation, which integrates hierarchical attention transformers to capture inter-dependencies between dual-arm joint states and visual inputs.

Abstract: We present InterACT: Inter-dependency aware Action Chunking with Hierarchical Attention Transformers, a novel imitation learning framework for bimanual manipulation that integrates hierarchical attention to capture inter-dependencies between dual-arm joint states and visual inputs. InterACT consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both designed to enhance information aggregation and coordination. The encoder processes multi-modal inputs through segment-wise and cross-segment attention mechanisms, while the decoder leverages synchronization blocks to refine individual action predictions, providing the counterpart's prediction as context. Our experiments on a variety of simulated and real-world bimanual manipulation tasks demonstrate that InterACT significantly outperforms existing methods. Detailed ablation studies validate the contributions of key components of our work, including the impact of CLS tokens, cross-segment encoders, and synchronization blocks.

Supplementary Material: zip

Spotlight Video: mp4

Website: https://soltanilara.github.io/interact/

Publication Agreement: pdf

Student Paper: yes

Submission Number: 552

Loading