Depth Estimation of Multi-Modal Scene Based on Multi-Scale Modulation

Anjie Wang; Zhijun Fang; Xiaoyan Jiang; Yongbin Gao; Gaofeng Cao; Siwei Ma

Depth Estimation of Multi-Modal Scene Based on Multi-Scale Modulation

Anjie Wang, Zhijun Fang, Xiaoyan Jiang, Yongbin Gao, Gaofeng Cao, Siwei Ma

Published: 01 Jan 2023, Last Modified: 08 Apr 2025ICIP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As multimodal information is complementary, effectively utilizing scene multimodal information has become an increasingly important research topic for many scholars. This paper proposes a novel multi-scale global learning strategy that utilizes both echo and visual modal data as inputs to estimate scene depth. The framework involves constructing a multi-scale feature extraction method using pyramid pooling modules to aggregate contextual information from different regions and improve global information acquisition ability. Furthermore, a recurrent multi-scale feature modulation module is introduced to generate more semantic and accurate spatial representations in each iteration update process. Additionally, a multi-scale fusion method is constructed for the fusion of echo and visual modalities. The proposed method's superior performance is demonstrated through sufficient experiments conducted on the Replica dataset.

Loading