LRDepth: Large-Region Aggregation at Low Resolution for Efficient Monocular Depth Estimation

Chao Ning, Weihao Xuan, Wanshui Gan, Naoto Yokoya

Published: 2025, Last Modified: 24 Jan 2026IROS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Monocular depth estimation (MDE) is crucial for various computer vision applications, but existing methods often struggle to balance inference speed and accuracy when processing large-region visual information. This paper introduces LR2Depth, a novel MDE method that addresses this challenge by utilizing large-kernel convolution on low-resolution feature maps for efficient large-region feature aggregation. Our approach leverages the fact that each pixel on low-resolution feature maps corresponds to a larger region of the original image, allowing for fast and accurate depth predictions at a lower inference cost. Extensive experiments on NYU-Depth-V2, KITTI, and SUN RGB-D datasets demonstrate that LR2Depth not only achieves state-of-the-art performance but also operates approximately twice as fast as previous MDE methods. Notably, at the time of submission, LR2Depth secured the top-1 position on the KITTI depth prediction online benchmark. The code is available in the project page.

External IDs:dblp:conf/iros/NingXGY25