ETV-MVS: Robust Visibility-Aware Multi-View Stereo with Epipolar Line-Based Transformer

Published: 01 Jan 2025, Last Modified: 22 Jul 2025Big Data Min. Anal. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-View Stereo (MVS) is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps. However, the reconstruction performance is hindered by visibility challenges, such as occlusions and non-overlapping regions. In this paper, we propose an innovative visibility-aware framework to address these issues. Central to our method is an Epipolar Line-based Transformer (ELT) module, which capitalizes on the epipolar line correspondence and candidate matching features between images to enhance the feature representation and correlation robustness. Furthermore, we propose a novel Supervised Visibility Estimation (SVE) module that estimates high-precision visibility maps, transcending the constraints of previous methods that rely on indirect supervision. By integrating these modules, our method achieves state-of-the-art results on the benchmarks and demonstrates its capability to perform high-quality reconstructions even in challenging regions. The code will be released at https://github.com/npucvr/ETV-MVS.
Loading