Abstract: Previous asymmetric image retrieval methods based on knowledge distillation have primarily focused on aligning the global features of two networks to transfer global semantic information from the gallery network to the query network. However, these methods often fail to effectively transfer local semantic information, limiting the fine-grained alignment of feature representation spaces between the two networks. To overcome this limitation, we propose a novel approach called Layered-Granularity Localized Distillation (GranDist). GranDist constructs layered feature representations that balance the richness of contextual information with the granularity of local features. As we progress through the layers, the contextual information becomes more detailed, but the semantic gap between networks can widen, complicating the transfer process. To address this challenge, GranDist decouples the feature maps at each layer to capture local features at different granularities and establishes distillation pipelines focused on effectively transferring these contextualized local features. In addition, we introduce an Unambiguous Localized Feature Selection (UnamSel) method, which leverages a well-trained fully connected layer to classify these contextual features as either ambiguous or unambiguous. By discarding the ambiguous features, we prevent the transfer of irrelevant or misleading information, such as background elements that are not pertinent to the retrieval task. Extensive experiments on various benchmark datasets demonstrate that our method outperforms state-of-the-art techniques and significantly enhances the performance of previous asymmetric retrieval approaches.
Loading