Towards Few-Shot Adaptation for Dense Cross-Modality Image Matching

18 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: cross-modality image matching, few-shot adaptation
TL;DR: a method for adapting a pretrained dense matcher to downstream cross-modality data by few-shot labelled samples
Abstract: Cross-modality image matching aims to establish correspondences between images captured under different sensing modalities. Recent advances in transformer-based dense matchers and large-scale synthetic training data have led to foundation models with strong generalization to unseen modalities. However, their performance degrades when the target modality diverges substantially from the pretraining distribution, making domain-specific adaptation essential. Since annotated data is often costly and limited, while unlabelled data is plentiful, we address this challenge by adapting pretrained dense matchers with a combination of few-shot labelled and abundant unlabelled samples. Specifically, we exploit the multi-scale architecture of dense matchers by using the finest-scale predictions to guide learning at coarser scales on unlabelled data. Extensive experiments across diverse modalities demonstrate that our approach consistently outperforms both foundation models and purely supervised adaptation, achieving up to 40% improvement in matching accuracy.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 10920
Loading