MSFORMER: Multi-Scale Transformer with Neighborhood Consensus for Feature MatchingDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 14 Nov 2023ICASSP 2023Readers: Everyone
Abstract: Existing feature matching methods tend to extract feature descriptors by feeding down-sampled feature maps into a Transformer that is unable to extend feature scales, leading to false correspondences between small-size objects. This paper proposes MSFormer, which uses Transformers situated in different branches to obtain feature descriptors. In one branch, convolutions are integrated into self-attention layers elegantly to compensate for the lack of the local structure information. In another branch, a multi-scale Transformer is proposed through injecting heterogeneous receptive field sizes into tokens. Additionally, a neighborhood consensus mechanism is proposed by re-ranking initial matches to make a constraint of geometric consensus on neighborhood feature descriptors. Extensive experiments on indoor and outdoor pose estimations show that MSFormer outperforms existing state-of-the- art methods by a large margin.
0 Replies

Loading