Abstract: This paper introduces the problem of long-range
monocular depth estimation for outdoor urban environments.
Range sensors and traditional depth estimation algorithms
(both stereo and single view) predict depth for distances of less
than 100 meters in outdoor settings and 10 meters in indoor
settings. The shortcomings of outdoor single view methods that
use learning approaches are, to some extent, due to the lack
of long-range ground truth training data, which in turn is due
to limitations of range sensors. To circumvent this, we first
propose a novel strategy for generating synthetic long-range
ground truth depth data. We utilize Google Earth images to
reconstruct large-scale 3D models of different cities with proper
scale. The acquired repository of 3D models and associated
RGB views along with their long-range depth renderings are
used as training data for depth prediction. We then train two
deep neural network models for long-range depth estimation:
i) a Convolutional Neural Network (CNN) and ii) a Generative
Adversarial Network (GAN). We found in our experiments that
the GAN model predicts depth more accurately. We plan to
open-source the database and the baseline models for public
use.
0 Replies
Loading