Abstract: This paper addresses the challenge of reconstructing a scene with a neural radiance field (NeRF) for robot vision and scene understanding using multiple modalities. Researchers have introduced the use of NeRF to represent an object for synthesizing and rendering novel views of complex scenes by optimizing a 3-D radiance field for ray casting and rendering for 2-D RGB images. However, using RGB images alone introduces additional geometry ambiguities with transparent objects or complex scenes and cannot accurately depict the 3-D shapes. We discuss and solve this problem and use multiple modalities as input for the same NeRF model to build a multimodal NeRF by incorporating point clouds and infrared image supervision to prevent such bias. In contrast to RGB images, infrared images and point clouds are typically taken by separate cameras that cannot be aligned with the RGB camera. We further introduce the alignment of different modalities based on point cloud registration to estimate the relative transformation matrices between them before training a NeRF model with multiple modalities. We evaluate our model on chosen scenes from the ScanNet and M2DGR datasets and demonstrate that it outperforms existing state-of-the-art methods.
0 Replies
Loading