TL;DR: We propose a two stage pipeline for unsupervised multi-object segmentation on single images by learning and reasoning with three level object-centric representations.
Abstract: We study the challenging problem of unsupervised multi-object segmentation on single images. Existing methods, which rely on image reconstruction objectives to learn objectness or leverage pretrained image features to group similar pixels, often succeed only in segmenting simple synthetic objects or discovering a limited number of real-world objects. In this paper, we introduce unMORE, a novel two-stage pipeline designed to identify many complex objects in real-world images. The key to our approach involves explicitly learning three levels of carefully defined object-centric representations in the first stage. Subsequently, our multi-object reasoning module utilizes these learned object priors to discover multiple objects in the second stage. Notably, this reasoning module is entirely network-free and does not require human labels. Extensive experiments demonstrate that unMORE significantly outperforms all existing unsupervised methods across 6 real-world benchmark datasets, including the challenging COCO dataset, achieving state-of-the-art object segmentation results. Remarkably, our method excels in crowded images where all baselines collapse. Our code and data are available at https://github.com/vLAR-group/unMORE.
Lay Summary: Identifying multiple objects in a single image without prior labels is a tough challenge in computer vision. However, humans can effortlessly recognize multiple objects in new situations, like identifying various animals after reading a book. Inspired by this ability, we investigate two critical issues: 1) how to define objects (i.e. objectness), 2) how to discover objects in unseen scenes.
We firstly learn objectness from monolithic object images. Then, the learned objectness priors are used to discover multiple objects in unseen scenes, without needing extra training or human labels. Specifically, we find that object centers and boundaries are informative characteristics that contribute to effective object reasoning.
Our work demonstrates state-of-the-art performance on challenging crowded images and paves a different way for unsupervised object discovery. By moving beyond traditional similarity-based methods, we highlight the potential of exploring richer object characteristics to improve object perception.
Link To Code: https://github.com/vLAR-group/unMORE
Primary Area: Deep Learning->Other Representation Learning
Keywords: unsupervised object segmentation, object-centric representation
Submission Number: 5450
Loading