SORNet: Spatial Object-Centric Representations for Sequential ManipulationDownload PDF

19 Jun 2021, 10:05 (edited 08 Nov 2021)CoRL2021 OralReaders: Everyone
  • Keywords: Object-centric Representation, Spatial Reasoning, Manipulation
  • Abstract: Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects or new tasks. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.
  • Supplementary Material: zip
  • Poster: pdf
14 Replies