Visual TransforMatcher: Efficient Match-to-Match Attention for Visual Correspondence

Seung Wook Kim; Juhong Min; Minsu Cho

Visual TransforMatcher: Efficient Match-to-Match Attention for Visual Correspondence

Seung Wook Kim, Juhong Min, Minsu Cho

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: image matching, semantic correspondence, visual transformer, 4D attention

Abstract: Establishing correspondences between images remains a challenging task, especially under large appearance changes due to different viewpoints and intra-class variations. In this work, we introduce a strong image matching learner, dubbed \textit{Visual Transformatcher}, which builds on the success of the Transformers in vision domains. Unlike previous self-attention schemes over image matches, it performs match-to-match attention for precise match localization and dynamically updates matching scores in a global context. To handle a large number of candidate matches in a dense correlation map, we develop a light-weight architecture with an effective positional encoding technique for matching. In experiments, our method achieves the new state of the art on the SPair-71k dataset, while performing on par with existing state-of-the-art models on the PF-PASCAL and PF-WILLOW datasets, showing the effectiveness of the proposed approach. We also provide the results of extensive ablation studies to justify the design choices of our model. The code and trained weights will be released upon acceptance.

One-sentence Summary: We propose a transformer-based, end-to-end learning pipeline for establishing semantic correspondences between a pair of images.

6 Replies

Loading