Unsupervised Discovery of 3D Physical Objects from Video

Yilun Du; Kevin A. Smith; Tomer Ullman; Joshua B. Tenenbaum; Jiajun Wu

Unsupervised Discovery of 3D Physical Objects from Video

Yilun Du, Kevin A. Smith, Tomer Ullman, Joshua B. Tenenbaum, Jiajun Wu

Published: 12 Jan 2021, Last Modified: 12 Oct 2025ICLR 2021 PosterReaders: Everyone

Keywords: unsupervised object discovery, surprisal, scene decomposition, physical scene understanding

Abstract: We study the problem of unsupervised physical object discovery. While existing frameworks aim to decompose scenes into 2D segments based off each object's appearance, we explore how physics, especially object interactions, facilitates disentangling of 3D geometry and position of objects from video, in an unsupervised manner. Drawing inspiration from developmental psychology, our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes, and infer properties of those objects. Our model reliably segments objects on both synthetic and real scenes. The discovered object properties can also be used to reason about physical events.

One-sentence Summary: We propose an unsupervised framework for discovery 3D physical objects and show that these 3D objects to be used for tasks mimicking early infant cognition.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Data: [ShapeNet](https://paperswithcode.com/dataset/shapenet)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/unsupervised-discovery-of-3d-physical-objects/code)

10 Replies

Loading