Abstract: With the rapid development of single-photon LiDAR, accurate depth recovery remains a key challenge. Conventional deep learning methods, such as CNNs and ViTs, leverage convolution and self-attention to extract local and global features, respectively. However, these models struggle to capture long-range dependencies in depth images, especially under low signal-to-background ratio (SBR) conditions. To address this, we propose Mamba-Unet-Depth, a novel network inspired by the Mamba architecture, which models long sequences and global context efficiently. By combining the hierarchical representation capability of U-Net with Mamba’s sequential modeling strength, the proposed model uses skip connections to retain spatial details across scales, facilitating richer feature learning. This enables more effective extraction of both fine-grained and contextual depth cues in challenging LiDAR data. Experimental results on the NYU Depth v2 dataset show that Mamba-Unet-Depth outperforms existing baselines in depth prediction accuracy and robustness, achieving state-of-the-art performance.
External IDs:dblp:journals/tcasII/PuZYPYY25
Loading