SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians

Published: 05 Nov 2025, Last Modified: 30 Jan 20263DV 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Gaussian Splatting, Open-Vocabulary Understanding, Semantic Segmentation
TL;DR: We embed high-dimensional language and semantic features into 3D Gaussian Splatting scenes for open-vocabulary and segmentation tasks using a superpoint-like representation.
Abstract: 3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While its vanilla representation is mainly designed for view synthesis, recent works extended it to scene understanding with language features. However, storing additional high-dimensional features per Gaussian for semantic information is memory-intensive, which limits their ability to segment and interpret challenging scenes. To this end, we introduce SuperGSeg, a novel approach that fosters cohesive, context-aware hierarchical scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural 3D Gaussians to learn geometry, instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of Super-Gaussians. Super-Gaussians facilitate the lifting and distillation of 2D language features into 3D space. They enable hierarchical scene understanding with high-dimensional language feature rendering at moderate GPU memory costs. Extensive experiments demonstrate that SuperGSeg achieves remarkable performance on both open-vocabulary object selection and semantic segmentation tasks. More results at supergseg.github.io.
Supplementary Material: zip
Submission Number: 148
Loading