Abstract: We explore the task of zero-shot semantic segmentation of 3D shapes by using large-scale off-the-shelf 2D image recognition models. Surprisingly, we find that modern zero-shot 2D object detectors are better suited for this
task than contemporary text/image similarity predictors or
even zero-shot 2D segmentation networks. Our key finding
is that it is possible to extract accurate 3D segmentation
maps from multi-view bounding box predictions by using
the topological properties of the underlying surface. For
this, we develop the Segmentation Assignment with Topological Reweighting (SATR) algorithm and evaluate it on
ShapeNetPart and our proposed FAUST benchmarks. SATR
achieves state-of-the-art performance and outperforms a
baseline algorithm by 1.3% and 4% average mIoU on the
FAUST coarse and fine-grained benchmarks, respectively,
and by 5.2% average mIoU on the ShapeNetPart benchmark. Our source code and data will be publicly released.
Project webpage: https://samir55.github.io/SATR/.
Loading