M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

Wentao Yuan; Adithyavairavan Murali; Arsalan Mousavian; Dieter Fox

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

Wentao Yuan, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

Published: 30 Aug 2023, Last Modified: 20 Apr 2025CoRL 2023 PosterReaders: Everyone

Keywords: Object manipulation, Multi-task learning, Pick and place

TL;DR: M2T2 (Multi-Task Masked Transformer) is a unified network architecture for predicting different types of action primitives.

Abstract: With the advent of large language models and large-scale robotic datasets, there has been tremendous progress in high-level decision-making for object manipulation. These generic models are able to interpret complex tasks using language commands, but they often have difficulties generalizing to out-of-distribution objects due to the inability of low-level action primitives. In contrast, existing task-specific models excel in low-level manipulation of unknown objects, but only work for a single type of action. To bridge this gap, we present M2T2, a single model that supplies different types of low-level actions that work robustly on arbitrary objects in cluttered scenes. M2T2 is a transformer model which reasons about contact points and predicts valid gripper poses for different action modes given a raw point cloud of the scene. Trained on a large-scale synthetic dataset with 128K scenes, M2T2 achieves zero-shot sim2real transfer on the real robot, outperforming the baseline system with state-of-the-art task-specific models by about 19% in overall performance and 37.5% in challenging scenes were the object needs to be re-oriented for collision-free placement. M2T2 also achieves state-of-the-art results on a subset of language conditioned tasks in RLBench. Videos of robot experiments on unseen objects in both real world and simulation are available at m2-t2.github.io.

Student First Author: yes

Supplementary Material: zip

Instructions: I have read the instructions for authors (https://corl2023.org/instructions-for-authors/)

Video: https://m2-t2.github.io

Website: https://m2-t2.github.io

Code: https://m2-t2.github.io

Publication Agreement: pdf

Poster Spotlight Video: mp4

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/m2t2-multi-task-masked-transformer-for-object/code)

6 Replies

Loading