MaskClustering: View Consensus Based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

Mi Yan, Jiazhao Zhang, Yan Zhu, He Wang

Published: 2024, Last Modified: 11 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Open-vocabulary 3D instance segmentation is cuttingedge for its ability to segment 3D instances without prede-fined categories. However, progress in 3D lags behind its 2D counterpart due to limited annotated 3D data. To ad-dress this, recent works first generate 2D open-vocabulary masks through 2D models and then merge them into 3D instances based on metrics calculated between two neigh-boring frames. In contrast to these local metrics, we pro-pose a novel metric, view consensus rate, to enhance the utilization of multi-view observations. The key insight is that two 2D masks should be deemed part of the same 3D instance if a significant number of other 2D masks from different views contain both these two masks. Using this metric as edge weight, we construct a global mask graph where each mask is a node. Through iterative clustering of masks showing high view consensus, we generate a series of clusters, each representing a distinct 3D instance. Notably, our model is training-free. Through extensive ex-periments on publicly available datasets, including Scan-Net++, ScanNet200 and MatterPo rt3D, we demonstrate that our method achieves state-of-the-art performance in open-vocabulary 3D instance segmentation. Our project page is at https://pku-epic.github.ioIMaskClustering.