Segment Anything Model is a Good Teacher for Local Feature Learning

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Local Feature Learning, Local Feature Detection and Description, Computer Vision, Deep Learning, AI
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We proposed SAMFeat, an innovative approach for learning local image features by utilizing knowledge from Segment Anything Model.
Abstract: Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in "any scene'' and "any downstream task''. Data-driven local feature learning methods need to rely on pixel-level correspondence for training, which is challenging to acquire at scale, thus hindering further performance improvement. In this paper, we propose SAMFeat to introduce SAM (segment anything model), a fundamental model trained on 10 million images, as a teacher to guide local feature learning and thus inspire higher performance on limited datasets. First, we construct an auxiliary task of Pixel Semantic Relational Distillation (PSRD) for distilling feature relations with category-agnostic semantic information learned by the SAM encoder into a local feature learning network, hence improving local feature description using semantic discrimination. Second, we develop a technique called Weakly Supervised Contrastive Learning Based on Semantic Grouping (WSC), which utilizes semantic groupings derived from SAM as weakly supervised signals to optimize the metric space of local descriptors. Third, we design an Edge Attention Guidance Module (EAGM) to further improve the accuracy of local feature detection and description by prompting the network to pay more attention to the edge region guided by SAM. SAMFeat's performance in experiments conducted on various tasks such as image matching on HPatches, and long-term, extensive visual localization datasets like Aachen Day-Night showcases its superiority over previous local features. The release code is available at supplementary material.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5119
Loading