Keywords: Image Matching, Keypoint Detection, Sparse Matching, Semi-Dense Matching, Depth-Auxiliary
Abstract: This study introduces DXFeat, a novel architecture that integrates depth infor-mation as an auxiliary branch for keypoint detection, leveraging depth cues to enhance localization accuracy, which improves localization accuracy with an average 3.1% gain while preserving inference efficiency. DXFeat refines feature extraction during training while maintaining computational efficiency. The model incorporates a modified reliability loss and learnable weighting mechanisms, balancing accuracy and robustness. By optimizing network channels while preserving high-resolution inputs, DXFeat supports both sparse and semi-dense matching, making it well-suited for visual localization and augmented reality. A depth-assisted refinement module further enhances feature representation using coarse local descriptors. Notably, the depth auxiliary branch is only needed during training, ensuring streamlined deployment. Comprehensive evaluations on MegaDepth, ScanNet, and HPatches confirm that the combination of loss-level optimization and depth-auxiliary refinement yields consistent AUC improvements, establishing DXFeat as a strong and efficient framework for real-world image matching tasks.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 24549
Loading