Matching While Perceiving: Enhance Image Feature Matching with Applicable Semantic Amalgamation

Shihua Zhang, Zhenjie Zhu, Zizhuo Li, Tao Lu, Jiayi Ma

Published: 01 Jan 2025, Last Modified: 01 Aug 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Image feature matching is a cardinal problem in computer vision, aiming to establish accurate correspondences between two-view images. Existing methods are constrained by the performance of feature extractors and struggle to capture local information affected by sparse texture or occlusions. Recognizing that human eyes consider not only similar local geometric features but also high-level semantic information of scene objects when matching images, this paper introduces SemaGlue. This novel algorithm perceives and incorporates semantic information into the matching process. In contrast to recent approaches that leverage semantic consistency to narrow the scope of matching areas, SemaGlue achieves semantic amalgamation with the designed Semantic-Aware Fusion (SAF) Block by injecting abundant semantic features from the pre-trained segmentation model. Moreover, the Cross-Domain Alignment (CDA) Block is proposed to address domain alignment issues, bridging the gaps between semantic and geometric domains to ensure applicable semantic amalgamation. Extensive experiments demonstrate that SemaGlue outperforms state-of-the-art methods across various applications such as homography estimation, relative pose estimation, and visual localization.