PanSt3R: Multi-view Consistent Panoptic Segmentation

Published: 28 Oct 2025, Last Modified: 12 Nov 2025ICCV 2025EveryoneCC BY 4.0
Abstract: Panoptic segmentation in 3D is a fundamental problem in scene understanding. Existing approaches typically rely on costly test-time optimizations (often based on NeRF) to consolidate 2D predictions of off-the-shelf panoptic segmentation methods into 3D. Instead, in this work, we propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization by jointly predicting 3D geometry and multi-view-consistent panoptic segmentation in a single forward pass. Our approach harnesses the 3D representations of MUSt3R, a recent scalable multi-view version of DUSt3R, and 2D representations of DINOv2, then performs joint multi-view panoptic prediction via a mask transformer architecture. We additionally revisit the standard post-processing mask merging procedure and introduce a more principled approach for multi-view segmentation. We also introduce a simple method for generating novel-view predictions based on the predictions of PanSt3R and vanilla 3DGS. Overall, the proposed PanSt3R is conceptually simple yet fast and scalable, and achieves state-of-the-art performance on several benchmarks, while being orders of magnitude faster
Loading