Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach

Published: 01 Oct 2024, Last Modified: 03 Dec 2024BoB Workshop 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Object Rearrangement, Robotic Manipulation, Open-vocabulary Manipulation
Abstract: The integration of large-scale Vision-Language Models (VLMs) with embodied AI can greatly enhance the generalizability and the capacity to follow open instructions for robots. However, existing studies on object manipulation are not up to full consideration of the 6-DoF requirements, let alone establishing a comprehensive benchmark. In this paper, we propel the pioneer construction of the benchmark and approach for Open-instruction 6-DoF Object Rearrangement (Open6DOR). Specifically, we collect a synthetic dataset of 200+ objects and carefully design 5400+ Open6DOR tasks. These tasks are divided into the Position-track, Rotation-track, and 6-DoF-track for evaluating different embodied agents in predicting the positions and rotations of target objects. Besides, we also propose a VLM-based approach for Open6DOR, named Open6DOR-GPT, which empowers GPT4V with 3D-awareness and simulation-assistance while exploiting its strengths in generalizability and instruction-following. We compare the existing embodied agents with our Open6DORGPT on the proposed Open6DOR benchmark and find that Open6DOR-GPT achieves state-of-the-art performance. We further show the impressive performance of Open6DOR-GPT in diverse real-world experiments.
Submission Number: 9
Loading