Learning Guided Implicit Depth Function With Scale-Aware Feature Fusion

Published: 01 Jan 2025, Last Modified: 22 Jul 2025IEEE Trans. Image Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, the single image super-resolution based on implicit image function is a hot topic, which learns a universal model for arbitrary upsampling scales. By contrast, color-guided depth map super-resolution is less explored based on implicit function learning. The related research faces three questions. First, is it also necessary and applicable to fuse the depth feature and the color feature in the encoder with continuous upsampling scales? Second, is the scale information in the encoder as important as that in the decoder? Third, how to efficiently and effectively model the affinity of location distance and content similarity within cross domains in the decoder? This paper proposes a transformer-based network to answer the above questions, which includes a depth super-resolution branch and a guidance extraction branch. Specifically, in the encoder, the effective implicit cross transformer is designed to fuse the guidance from the color feature with continuous coordinate mapping. In addition, the unrelated guidance is filtered out by correlation evaluation in the high-dimension feature space. Unlike the scale only introduced in the decoder, this paper additionally embeds the scale into the position encoding and the feed-forward network in the encoder to learn the scale-aware feature representation. In the decoder, the high-resolution depth feature is reconstructed by using the internal prior and the external guidance. The internal prior is implemented by implicit self-attention in the depth super-resolution branch, and the external guidance is exploited via implicit cross-attention between both branches. Finally, the above decoded features are complementary to generate the high-resolution depth map. The sufficient experiments on the synthetic and real datasets for in-distribution and out-of-distribution upsampling scales validate the improved performance. The code and the models are public via https://github.com/NaNRan13/GIDF
Loading