Structure-Aware Neural Radiance Fields without Posed Camera
Abstract: The neural radiance fields (NeRF) for realistic novel view synthesis require camera poses to be pre-acquired
by a structure-from-motion (SfM) approach. This two-stage strategy is not convenient to use and degrades
the performance because the error in the pose extraction can propagate to the view synthesis. We integrate
pose extraction and view synthesis into a jointly optimized process so that they can benefit from each
other. For network training, only images are given without pre-known camera poses. The camera poses are
obtained by the depth-consistent constraint in which the identical feature in different views has the same
world coordinates transformed from the local camera coordinates according to the extracted poses. The
depth-consistent constraint is jointly optimized with the pixel color constraint. The poses are represented
by a CNN-based deep network, whose input is the related frames. This joint optimization enables NeRF to
be aware of the scene’s structure, resulting in improved generalization performance. Experiments on three
datasets demonstrate the effectiveness of camera pose estimation and novel view synthesis. Code is available
at https://github.com/XTU-PR-LAB/SaNerf.
Loading