Deep Fisher-Vector Descriptors for Image Retrieval and Scene Recognition

Published: 01 Jan 2024, Last Modified: 14 Nov 2024MVRMLM@ICMR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This study presents a novel architecture that significantly enhances the capabilities of large-scale image retrieval and recognition systems. We introduce a novel multi-stream Fisher vector network that integrates a convolutional neural network (CNN) with a Fisher Vector (FV) framework to optimize feature extraction and aggregation. The CNN component generates dense, deep convolutional descriptors, which are subsequently aggregated by the Fisher Vector method to enhance recognition accuracy. Importantly, the CNN and Fisher Vector model parameters are learnt simultaneously in an end-to-end manner. This allows us to account for the evolving distribution of deep descriptors over the course of the learning process. This integrated learning strategy results in a robust model that achieves excellent performance in both image retrieval and recognition tasks, as demonstrated on standard datasets.
Loading