Normalized Matching Transformer

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Keypoint Matching, Graph Matching, Normalized Transformer, Hyperspherical Learning
Abstract: We introduce the Normalized Matching Transformer (NMT), a deep learning approach for efficient and accurate sparse keypoint matching between image pairs. NMT consists of a strong visual backbone, geometric feature refinement via SplineCNN, followed by a normalized transformer for computing matching features. Central to NMT is our hyperspherical normalization strategy: we enforce unitnorm embeddings at every transformer layer and train with a combined contrastive InfoNCE and hyperspherical uniformity loss to yield more discriminative keypoint representations. This novel architecture/loss combination encourages close alignment of matching image features and large distance between non-matching ones not only at the output level, but for each layer. Despite its architectural simplicity, NMT sets a new state-of-the-art performance on PascalVOC and SPair-71k, outperforming BBGM (Rol´ınek et al. 2020), ASAR (Ren et al. 2022), COMMON (Lin et al. 2023) and GMTR (Guo et al. 2024) by 5.1% and 2.2%, respectively, while converging in at least ≥ 1.7× fewer epochs compared to other state of the art baselines. These results underscore the power of combining pervasive normalization with hyperspherical learning for geometric matching tasks.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 8931
Loading