Abstract: With the growth of various types of media data (e.g., text, image, video, and audio), fine-grained cross-media retrieval which aims to provide flexible and accurate query service has attracted significant attention. Different from traditional keyword-based retrieval, the queries and results in fine-grained cross-media retrieval might be different types. In this work, we demonstrate that the quality of retrieval results can be further improved by additionally considering the media-specified information. Besides, the process of feature extraction should be different for queries with various media types. To this end, we propose a novel network architecture, namely Double Branch Fine-grained Cross-media Net (DBFC-Net), which is the first work that can use the media-specific information to construct the common features by a uniform framework. Furthermore, we devise an effective distance metric (cosine+) for fine-grained cross-media retrieval. Compared with commonly-used metrics (e.g., cosine function), our proposed cosine+ metric is well adaptive to handle fine-grained retrieval scenarios. Extensive experiments and ablation studies on publicly available datasets demonstrate the effectiveness of our proposed approach.
0 Replies
Loading