Abstract: We present a reproducibility study of XFeat, a lightweight local feature extractor and matcher designed for efficient visual correspondence on resource-constrained hardware. We re-implement the architecture based on the paper and supplementary material, re-evaluate the authors' released checkpoint alongside our re-implementation, and conduct additional architectural ablations to clarify unmotivated design choices. This distinction between re-evaluation and reproduction is crucial, as the paper, supplement, and public code differ in several important details, including the backbone layout, the fusion block, and the training losses. Empirically, our reproduced models closely match, and in some cases slightly outperform, the re-evaluated original checkpoint on Megadepth-1500 and ScanNet-1500, supporting the main claim that XFeat provides a strong accuracy–efficiency trade-off for real-world use. At the same time, our ablations explore two seemingly crucial architectural arguments from the original paper. In particular, the parallel keypoint branch is important for semi-dense matching, but its benefit is less pronounced than the original paper claims, and the motivation for the single skip-connection is less conclusive than originally implied. Finally, our experiments show that downstream computer vision tasks, such as homography estimation, can be reproduced successfully, whereas visual localization on Aachen remains below the paper's reported numbers even when re-evaluating the authors' own checkpoint, suggesting the gap stems from underspecified evaluation details rather than the model itself.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Vasileios_Belagiannis1
Submission Number: 8814
Loading