Abstract: To obtain feature descriptors, current detector-free feature matching algorithms usually leverage attention at a coarse level to model relationships between keypoints. Nevertheless, relying solely on global cues from other points would bring uncertainty in high-quality feature matching. Rethinking the matching process of humans, humans not only look back-and-forth but also reference the details surrounding keypoints for precise localization. Based on the above observations, a novel Transformer-based detector-free matcher, entitled FineFormer, is proposed. FineFormer not only aggregates global cues from keypoints but also takes local details around each keypoint into account. Extensive experiments across three datasets demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
Loading