Keywords: Distilled radiance fields, Robotics, Geometry-grounded visual semantics, Gaussian Splatting, NeRFs
TL;DR: We explore the geometry-grounded semantic features in distilled radiance fields and find that although these features provide finer geometric detail, they do not outperform purely visual semantic features.
Abstract: Semantic distillation in radiance fields has spurred significant advances in open-
vocabulary robot policies, e.g., in manipulation and navigation, founded on pre-
trained semantics from large vision models. While prior work has demonstrated the
effectiveness of visual-only semantic features (e.g., DINO and CLIP) in Gaussian
Splatting and neural radiance fields, the potential benefit of geometry-grounding in
distilled fields remains an open question. In principle, visual-geometry features
seem very promising for spatial tasks such as pose estimation, prompting the ques-
tion: Do geometry-grounded semantic features offer an edge in distilled fields?
Specifically, we ask three critical questions: First, does spatial-grounding produce
higher-fidelity geometry-aware semantic features? We find that image features
from geometry-grounded backbones contain finer structural details compared to
their counterparts. Secondly, does geometry-grounding improve semantic object
localization? We observe no significant difference in this task. Thirdly, does
geometry-grounding enable higher-accuracy radiance field inversion? Given the
limitations of prior work and their lack of semantics integration, we propose a novel
framework SPINE for inverting radiance fields without an initial guess, consisting
of two core components: (i) coarse inversion using distilled semantics, and (ii)
fine inversion using photometric-based optimization. Surprisingly, we find that the
pose estimation accuracy decreases with geometry-grounded features. Our results
suggest that visual-only features offer greater versatility for a broader range of
downstream tasks, although geometry-grounded features contain more geometric
detail. Notably, our findings underscore the necessity of future research on effective
strategies for geometry-grounding that augment the versatility and performance of
pretrained semantic features.
Supplementary Material: pdf
Primary Area: applications to robotics, autonomy, planning
Submission Number: 18317
Loading