Scalable Bag of Selected Deep Features for Visual Instance Retrieval

Published: 2018, Last Modified: 17 Sept 2025MMM (2) 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent studies show that aggregating activations of convolutional layers from CNN models together as a global descriptor leads to promising performance for instance retrieval. However, due to the global pooling strategy adopted, the generated feature representation is lack of discriminative local structure information and is degraded by irrelevant image patterns or background clutter. In this paper, we propose a novel Bag-of-Deep-Visual-Words (BoDVW) model for instance retrieval. Activations of convolutional feature maps are extracted as a set of individual semantic-aware local features. An energy-based feature selection is adopted to filter out features on homogeneous background with poor distinction. To achieve the scalability of local feature-level cross matching, the local deep CNN features are quantized to adapt to the inverted index structure. A new cross-matching metric is defined to measure image similarity. Our approach achieves respectable performance in comparison to other state-of-the-art methods. Especially, it is proved to be more effective and efficient on large scale datasets.
Loading