Keywords: robot, benchmark, dataset, mobile manipulation, robotics, navigation, perception, manipulation, natural language, multiroom environments, long-horizon
TL;DR: A robot mobile manipulation benchmark integrating language, navigation, manipulation, and perception for long-horizon tasks in multiroom environments.
Abstract: As robots that follow natural language become more capable and prevalent, we need a benchmark to holistically develop and evaluate their ability to solve long-horizon mobile manipulation tasks in large, diverse environments. To tackle this challenge, robots must use visual and language understanding, navigation, and manipulation capabilities. Existing datasets do not integrate all these aspects, restricting their efficacy as benchmarks. To address this gap, we present the Language, Navigation, Manipulation, Perception (LaNMP) dataset and
demonstrate the benefits of integrating these four capabilities and various modalities. LaNMP comprises 574 trajectories across eight simulated and real-world environments for long-horizon room-to-room pick-and-place tasks specified by natural language. Trajectories consists of over 20 attributes, including RGB-D images, segmentations, and the poses of the robot body, end-effector, and grasped objects. We fine-tuned and tested two models in simulation to demonstrate the benchmark’s efficacy in development and evaluation, as well as making models more sample efficient. The models performed suboptimally compared to humans across various metrics; however, showed promise in increasing model sample efficiency, indicating significant room for developing better multimodal mobile manipulation models using our benchmark.
Submission Number: 29
Loading