Depth Estimation of Multi-Modal Scene Based on Multi-Scale Modulation

Published: 01 Jan 2023, Last Modified: 08 Apr 2025ICIP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As multimodal information is complementary, effectively utilizing scene multimodal information has become an increasingly important research topic for many scholars. This paper proposes a novel multi-scale global learning strategy that utilizes both echo and visual modal data as inputs to estimate scene depth. The framework involves constructing a multi-scale feature extraction method using pyramid pooling modules to aggregate contextual information from different regions and improve global information acquisition ability. Furthermore, a recurrent multi-scale feature modulation module is introduced to generate more semantic and accurate spatial representations in each iteration update process. Additionally, a multi-scale fusion method is constructed for the fusion of echo and visual modalities. The proposed method's superior performance is demonstrated through sufficient experiments conducted on the Replica dataset.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview