Abstract: Monocular depth estimation (MDE) is crucial for various computer vision applications, but existing methods often struggle to balance inference speed and accuracy when processing large-region visual information. This paper introduces LR<sup>2</sup>Depth, a novel MDE method that addresses this challenge by utilizing large-kernel convolution on low-resolution feature maps for efficient large-region feature aggregation. Our approach leverages the fact that each pixel on low-resolution feature maps corresponds to a larger region of the original image, allowing for fast and accurate depth predictions at a lower inference cost. Extensive experiments on NYU-Depth-V2, KITTI, and SUN RGB-D datasets demonstrate that LR<sup>2</sup>Depth not only achieves state-of-the-art performance but also operates approximately twice as fast as previous MDE methods. Notably, at the time of submission, LR<sup>2</sup>Depth secured the top-1 position on the KITTI depth prediction online benchmark. The code is available in the project page.
External IDs:dblp:conf/iros/NingXGY25
Loading