Abstract: It has been shown that learning radiance fields with depth rendering and depth supervision can effectively promote the quality and convergence of view synthesis. However, this paradigm requires input RGB-D sequences to be synchronized. In the UAV city modeling scenario, there exists asynchrony between RGB images and depth images due to the different frequencies of the solid-state LiDAR and RGB sensors. To synthesize high-quality views in such a scenario, we propose a novel time-pose function, which is an implicit network that maps timestamps to SE(3) elements. To train this function, we also design a joint optimization scheme to jointly learn the large-scale depth-regularized radiance fields and the time-pose function. Furthermore, we propose a large synthetic dataset with diverse controlled mismatches and ground truth to evaluate this new problem setting systematically. The proposed approach has been evaluated on both datasets and in a real drone. To evaluate the impact of view density, each algorithm was test on three different trajectories with different view densities. Compared to state-of-the-art baseline methods, the proposed approach reduces reconstruction error by 35.26% in city modeling scenarios. Our code is available at github.com/saythe17/AsyncNeRF.
Loading