OSNeRF: On-demand Semantic Neural Radiance Fields for Fast and Robust 3D Object Reconstruction

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: By leveraging multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a prominent technique in the realm of 3D object reconstruction. However, existing methods primarily focus on global scene reconstruction using large datasets, which necessitate substantial computational resources and impose high-quality requirements on input images. Nevertheless, in practical applications, users prioritize the 3D reconstruction results of on-demand specific object (OSO) based on their individual demands . Furthermore, the collected images transmitted through high-interference wireless environment (HIWE) leads to negatively impact the accuracy of NeRF reconstruction, thereby limiting its scalability. In this paper, we propose a novel on-demand Semantic Neural Radiance Fields (OSNeRF) scheme, which offers fast and robust 3D object reconstruction for diverse tasks. Within OSNeRF, semantic encoder is employed to extract core semantic features of OSOs from the collected scene images, semantic decoder is utilized to facilitate robust image recovery under HIWE conditions, lightweight renderer is employed for fast and efficient object reconstruction. Moreover, a semantic control unit (SCU) is introduced to guide above components, thereby enhancing the efficiency of reconstruction. Demonstrative experiments demonstrate that the proposed OSNeRF enables fast and robust object reconstruction in HIWE, surpassing the performance of state-of-the-art (SOTA) methods in terms of reconstruction quality.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Generation] Generative Multimedia, [Experience] Multimedia Applications, [Generation] Multimedia Foundation Models
Relevance To Conference: Our proposed TSNeRF can contribute to multimedia/multimodal processing through its applications and potential integration with other systems, which can be summarised as 1:Enhanced Visual Representations: By generating realistic 3D reconstructions, TSNerf can provide more detailed and accurate visual representations of objects and scenes. These reconstructions can be used as a visual input in multimedia processing pipelines, contributing to tasks such as object recognition, scene understanding, and visual content analysis. 2:Augmented Reality (AR) and Virtual Reality (VR): TSNerf can be employed to create immersive AR and VR experiences by synthesizing realistic 3D content from 2D images or videos. This enables the integration of virtual objects into real-world environments, enhancing the multimodal nature of AR/VR systems that combine visuals with other sensory inputs like audio and haptic feedback. 3: Sensor Fusion: In multimodal processing, combining information from different sensors is crucial to obtain a comprehensive understanding of the environment. TSNERF can fuse visual information with data from other sensors into semantic information. This integration of visual and sensor data can lead to improved multimodal perception and understanding of the scene.
Supplementary Material: zip
Submission Number: 5611
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview