DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fine-grained image retrieval (FGIR) aims to learn to generate visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose various techniques to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design high-performance FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a semantic-oriented module. These components serve to magnify objects and identify discriminative regions, respectively. Following G3, we implement a discriminative model training strategy to improve the discriminability and generalization ability of DVF. Extensive analysis and ablation studies confirm the efficacy of our proposed guidelines. Without bells and whistles, our DVF achieves state-of-the-art performance on three widely-used fine-grained datasets in closed-set and open-set settings.
Primary Subject Area: [Engagement] Multimedia Search and Recommendation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: This paper introduces the multi-modal vision foundation model to solve visual fine-grained image retrieval tasks using multi-modal input, thus expanding the scope of multi-modal content for fine-grained image retrieval.
Submission Number: 1810
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview