Abstract: Monocular depth estimation is fundamental for 3D scene
understanding and downstream applications. However, even
under the supervised setup, it is still challenging and ill-posed
due to the lack of full geometric constraints. Although
a scene can consist of millions of pixels, there are fewer
high-level patterns. We propose iDisc to learn those patterns
with internal discretized representations. The method implicitly
partitions the scene into a set of high-level patterns.
In particular, our new module, Internal Discretization (ID),
implements a continuous-discrete-continuous bottleneck to
learn those concepts without supervision. In contrast to
state-of-the-art methods, the proposed model does not enforce
any explicit constraints or priors on the depth output.
The whole network with the ID module can be trained end-to-
end, thanks to the bottleneck module based on attention.
Our method sets the new state of the art with significant
improvements on NYU-Depth v2 and KITTI, outperforming
all published methods on the official KITTI benchmark.
iDisc can also achieve state-of-the-art results on surface
normal estimation. Further, we explore the model generalization
capability via zero-shot testing. We observe the
compelling need to promote diversification in the outdoor
scenario. Hence, we introduce splits of two autonomous
driving datasets, DDAD and Argoverse. Code is available
at http://vis.xyz/pub/idisc
0 Replies
Loading