Keywords: Rearrangement, Robot Manipulation, Task and Motion Planning
TL;DR: We propose a method for tabletop object rearrangement that deals with cluttered initial scene, target object selectivity, and occupied goal positions -- the first system to address all three concurrently in a purely image-based setting.
Abstract: We propose an image-based, learned method for selective tabletop object rearrangement in clutter using a parallel jaw gripper. Our method consists of three stages: graph-based object sequencing (which object to move), feature-based action selection (whether to push or grasp, and at what position and orientation) and a visual correspondence-based placement policy (where to place a grasped object). Experiments show that this decomposition works well in challenging settings requiring the robot to begin with an initially cluttered scene, selecting only the objects that need to be rearranged while discarding others, and dealing with cases where the goal location for an object is already occupied – making it the first system to address all these concurrently in a purely image-based setting. We also achieve an $\sim$ 8% improvement in task success rate over the previously best reported result that handles both translation and orientation in less restrictive (un-cluttered, non-selective) settings. We demonstrate zero-shot transfer of our system solely trained in simulation to a real robot selectively rearranging up to everyday objects, many unseen during learning, on a crowded tabletop. Videos:https://sites.google.com/view/selective-rearrangement
Student First Author: yes
Supplementary Material: zip