DiffGlue: Diffusion-Aided Image Feature Matching

Shihua Zhang; Jiayi Ma

DiffGlue: Diffusion-Aided Image Feature Matching

Shihua Zhang, Jiayi Ma

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As one of the most fundamental computer vision problems, image feature matching aims to establish correct correspondences between two-view images. Existing studies enhance the descriptions of feature points with graph neural network (GNN), identifying correspondences with the predicted assignment matrix. However, this pipeline easily falls into a suboptimal result during training for the solution space is extremely complex, and is inaccessible to the prior that can guide the information propagation and network convergence. In this paper, we propose a novel method called DiffGlue that introduces the Diffusion Model into the sparse image feature matching framework. Concretely, based on the incrementally iterative diffusion and denoising processes, DiffGlue can be guided by the prior from the Diffusion Model and trained step by step on the optimization path, approaching the optimal solution progressively. Besides, it contains a special Assignment-Guided Attention as a bridge to merge the Diffusion Model and sparse image feature matching, which injects the inherent prior into GNN thereby ameliorating the message delivery. Extensive experiments reveal that DiffGlue converges faster and better, outperforming state-of-the-arts on several applications such as homography estimation, relative pose estimation, and visual localization. The code is available at https://github.com/SuhZhang/DiffGlue.

Primary Subject Area: [Content] Vision and Language

Relevance To Conference: In this work, we propose a novel method called DiffGlue that for the first time introduces the Diffusion Model into the framework of image feature matching. Concretely, with the forward and backward processes, the optimization path in the complex solution space can be learned step by step, leading to the global optimum. Additionally, we design a special Assignment-Guided Attention that leverages the prior from the Diffusion Model, making the Diffusion Model available for the sparse image feature matching and delivering messages more effectively in the GNN. Due to the fact that image feature matching is a fundamental problem in computer vision, this better matching method facilitates the application of various advanced multimedia techniques including visual localization, 3D reconstruction, new view synthesis, rendering, and so on. And this Diffusion-based image feature matching framework can also be simply extended to multimodal image matching.

Supplementary Material: zip

Submission Number: 2378

Loading