CANet: Cross-Scale and Cross-Modality Aggregation Network for Scene Depth Super-Resolution

Published: 01 Jan 2024, Last Modified: 13 Apr 2025IEEE Trans. Multim. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing depth super-resolution (DSR) methods typically utilize an additional high-resolution (HR) color image of the same scene as assistance to recover the low-resolution (LR) depth map. Although these color-guided methods have achieved impressive progress, they easily face with color image under-utilization and mis-utilization issues. In this article, we deeply investigate the above problems and further propose a novel DSR framework to alleviate them. Specifically, we propose a Cross-scale and Cross-modality Aggregation Network (C$^{2}$ANet) to learn abundant and accurate complementarity from color images to help recover the degraded depth map. Our C$^{2}$ANet can simultaneously extract multi-scale representations from color images with parallel network hierarchies, and effectively aggregate cross-scale and cross-modality contexts to boost HR representations in each hierarchy. Then, to appropriately use the guided color image, we further design a Feature Aggregation Module (FAM) to adaptively select and fuse task-relevant features, which consists of (1) a feature alignment block to learn transformation offsets and align upsampled features with targeted HR features, and (2) a feature fusion block based on cross-attention mechanism to maintain strong structural context and suppress texture distraction. Experimental results on synthetic and real-world benchmark datasets demonstrate the superiority of our proposed method in comparison with other state-of-the-art DSR methods.
Loading