Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting

Published: 05 Nov 2025, Last Modified: 30 Jan 20263DV 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Reconstruction, 3D Scene Understanding, Open-Vocabulary Segmentation, 3D Gaussian Splatting
TL;DR: We fuse noisy, view-dependent 2D language features into 3D Gaussians via visibility-aware gating and a streaming, weighted geometric median, yielding sharper boundaries and cross-view-consistent open-vocabulary 3D semantics.
Abstract: Recently, distilling open-vocabulary language features from 2D images into 3D Gaussians has attracted significant attention. Although existing methods achieve impressive language-based interactions with 3D scenes, we observe two fundamental issues: background Gaussians, which contribute negligibly to a rendered pixel, receive the same feature as the dominant foreground ones, and multi-view inconsistencies due to view-specific noise in language embeddings. We introduce Visibility-Aware Language Aggregation (VALA), a lightweight yet effective method that computes marginal contributions for each ray and applies a visibility-aware gate to retain only visible Gaussians. Moreover, we propose a streaming weighted geometric median in cosine space to merge noisy multi-view features. Our method yields a robust, view-consistent language feature embedding in a fast and memory-efficient manner. VALA improves open-vocabulary localization and segmentation across reference datasets, consistently surpassing existing works. The source code is available on https://github.com/changandao/VALA.
Supplementary Material: zip
Submission Number: 101
Loading