MG-NeRF: Multimodal Representation Learning for Generalizable NeRF

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Neural Radiance Field, Multimodal, Generalization
TL;DR: We propose MG-NeRF, a novel learning framework that incorporates multimodal information to polish the generalization performance of NeRF.
Abstract: Neural Radiance Field (NeRF) is a novel view synthesis model for three-dimensional space implicit representation, which has swept the field of computer vision. However, vanilla NeRF overfits to a single scene and fails to synthesize novel views of unseen scenes. It is training expensive to learn per-scene representation so that generalization capability of NeRF has aroused tremendous attention. Previous works address the challenge through multi-view learning methods and achieve remarkable results. They convert novel view synthesis to multi-view image-based interpolation problem. These methods focus on image modality merely, while overlooking other meaningful multimodal knowledge. We propose MG-NeRF, a novel learning framework that incorporates multimodal information to polish the generalization performance of NeRF. To our best knowledge, we are the first to apply multimodal to NeRF’s generalization. We employ a vision language pre-training (VLP) framework to generate text information for each scene. Then, image and text features are fused and fed to NeRF. Due to the alignment of image and text modalities, we bring in a semantic loss to encourage NeRF to synthesize reasonable novel views. For positional encoding, a frequency regularization mechanism is introduced to prevent NeRF from overfitting to high frequency information. We show that MG-NeRF achieves appreciable achievement on novel view synthesis of unseen scenes even trained with considerably less resources than prior work. We will public our code once upon acceptance.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3188
Loading