SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Wentao Yuan; Chris Paxton; Karthik Desingh; Dieter Fox

SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Wentao Yuan, Chris Paxton, Karthik Desingh, Dieter Fox

Published: 13 Sept 2021, Last Modified: 27 Apr 2025CoRL2021 OralReaders: Everyone

Keywords: Object-centric Representation, Spatial Reasoning, Manipulation

Abstract: Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects or new tasks. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.

Supplementary Material: zip

Poster: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/sornet-spatial-object-centric-representations/code)

14 Replies

Loading