Bridging the Gap between Semantic Correspondence and Robust Visual Representation

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: semantic correspondence, foundation models
TL;DR: A state-of-the-art model designed for semantic correspondence.
Abstract: Predicting cross-image semantic correspondence among various instances within the same category is a fundamental but challenging task in computer vision. Models are supposed to characterize both high-level semantic features and low-level texture information to accurately finds the correspondence between pixels. The quality of features directly affects the matching results. Recently, pre-trained models with self-supervised training methods have demonstrated promising performance in representation learning and can serve as a strong backbone to provide robust visual features. However, existing methods have been found to poorly adapt to such features. Their complex designs of the matching module do not yield significant performance boost due to the disruption of the original representation and the absence of high-resolution low-level information. In this work, we introduce a simple yet effective framework named ViTSC to unlock the substantial potential of self-supervised vision transformers for semantic correspondence. We introduce three key components: a cross-perception module to align semantic features of the same part from different images while preserving the original representation as much as possible, an auxiliary loss to eliminate ambiguity from semantically similar objects, and a low-level correlation-guided upsampler to generate high-resolution flow maps for precise localization. ViTSC shows reliable semantic correspondence performance, surpassing previous state-of-the-art methods on all three standard benchmarks SPair-71k, PF-PASCAL and PF-WILLOW.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6936
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview