Abstract: Fine-Grained Image Retrieval (FGIR) has become a crucial research area of fine-grained analysis. Despite the extensive progress in FGIR, the main problem remains. Existing methods rely mainly on object-level representations, which are interrupted by background clutters. Instead of object-level features, in this paper, we propose to learn part-level representations. We first present a novel Attention-Activation-based Part Detector (AAPD) without any part-level annotations to extract part-level features and to remove background noises. AAPD not only localizes the discriminative part by attention mechanism automatically but also selects the part with high activation values in an unsupervised way. Then we propose a novel unified Multiple Part-level Feature Ensemble (MPFE) framework to assemble serval part-level features extracted by AAPDs. Finally, the proposed MPFE is evaluated on two widely-used benchmark datasets, including CUB-200-2011 and Cars-196. The MPFE Framework is capable of localizing discriminative parts and achieves state-of-the-art performances on two datasets.
0 Replies
Loading