Abstract: The remarkable success of neural radiance fields in low-level vision tasks such as novel view synthesis has motivated its extension to high-level semantic understanding, giving rise to the concept of the neural semantic field (NeSF). NeSF aims to simultaneously synthesize novel view images and associated semantic segmentation maps. Generalizable NeSF, in particular, is an appealing direction as it can generalize to unseen scenes for synthesizing images and semantic maps for novel views, thereby avoiding the need for tedious per-scene optimization. However, existing approaches to generalizable NeSF fall short in fully exploiting the geometric and semantic features as well as their mutual interactions, resulting in suboptimal performance in both novel-view image synthesis and semantic segmentation. To address this limitation, we propose Geometry-Semantics Synergy for Generalized Neural Semantic Fields (GS$^2$-GNeSF), a novel approach aimed at improving the performance of generalizable NeSF through the comprehensive construction and synergistic interaction of geometric and semantic features.
In GS$^2$-GNeSF, we introduce a robust geometric prior generator to generate the cost volumes and depth prior, which aid in constructing geometric features and facilitating geometry-aware sampling. Leveraging the depth prior, we additionally construct a global semantic context for the target view. This context provides two types of compensation information to enhance geometry and semantic features, achieved through boundary detection and semantic segmentation, respectively. Lastly, we present an efficient dual-directional interactive attention mechanism to foster deep interactions between the enhanced geometric and semantic features.
Experiments conducted on both synthetic and real datasets demonstrate that our GS$^2$-GNeSF outperforms existing methods in both novel view and semantic map synthesis, highlighting its effectiveness in generalizing neural semantic fields for unseen scenes.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: This work makes contribution to the field of multimedia processing through the introduction of a new approach named Geometry-Semantics Synergy for Generalized Neural Semantic Fields (GS$^2$-GNeSF). GS$^2$-GNeSF aims to synthesize novel viewpoint images and their corresponding semantic segmentation maps. This task involves elevating the interaction and fusion of geometric and semantic features within the context of generating Neural Semantic Fields (NeSF), particularly within the realm of generalizable NeSF, to a new level. GS$^2$-GNeSF not only surpasses existing generalizable NeSF in terms of novel viewpoint image synthesis and semantic map synthesis but also showcases how the interaction and fusion of geometric and semantic features can be effectively applied to the generalization tasks for unseen scenes. This represents an important advancement for the fields of multimedia and multimodal processing, as it provides a new way to understand and generate rich multimodal content, which is crucial for improving the performance and realism of machine vision, augmented reality, and other applications.
Supplementary Material: zip
Submission Number: 2806
Loading