Fusing feature and similarity for multimodal search

Guoli Song, Shuhui Wang, Qi Tian

Published: 2015, Last Modified: 07 Jan 2026ChinaSIP 2015EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: It is well known that multiple information fusion can enhance the retrieval performance of multimedia systems. However, what to fuse and how to fuse them are still open issues for multimodal correlation learning. In this paper, we address the problem of combining multiple resources to enhance the multimodal correlation learning ability. We propose two fusion strategies: multi-feature fusion and multi-similarity fusion. For multi-feature fusion, feature concatenation is used to integrate various features. For multi-similarity fusion, three fusion rules are investigated: MIN, MAX, and weighted AVG fusion. The effectiveness of the fusion strategies is evaluated on several state-of-the-art multimodal correlation learning models for cross-modal retrieval tasks. Results suggest that with proper fusion strategy selection, the multimodal retrieval performance can be significantly enhanced.

External IDs:dblp:conf/chinasip/SongWT15