ProFuse: Efficient Open-Vocabulary 3D Gaussian Splatting with Early-Saturating Semantic Uplifting

Published: 12 May 2026, Last Modified: 12 May 20262nd ViSCALE @ CVPR 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Open-vocabulary 3D scene understanding, 3D Gaussian Splatting, semantic uplifting, dense correspondence, multi-view consistency, query efficiency, compact scene representation, deployment efficiency
TL;DR: ProFuse builds a compact open-vocabulary 3D Gaussian scene that improves retrieval accuracy while reducing deployment and query cost.
Abstract: We present ProFuse, a resource-efficient framework for open-vocabulary 3D understanding with 3D Gaussian Splatting. ProFuse uses dense multi-view correspondences to initialize a compact Gaussian scene without densification while simultaneously linking per-view masks into 3D Context Proposals. Each proposal aggregates a global feature from its member masks, and global features are attached to Gaussians through visibility-weighted accumulation along camera rays, producing coherent per-primitive semantics without render-supervised training or gradient-based optimization. This correspondence-guided design reduces scene preparation cost, yields a smaller Gaussian set for downstream querying, and reaches stable accuracy at shallow uplifting depth, which eliminates the need to sweep for an ideal depth across scenes during feature uplifting. Experiments on LERF and ScanNet show that ProFuse reduces offline deployment cost and accelerates online querying efficiency, while improving retrieval accuracy over prior 3D Gaussian baselines.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 12
Loading