Cross-modal correlation learning for clustering on image-audio dataset

Hong Zhang, Yueting Zhuang, Fei Wu

Published: 2007, Last Modified: 16 May 2023ACM Multimedia 2007Readers: Everyone

Abstract: It is interesting and challenging to explore correlations between different datasets and utilize such correlations for the clustering on these datasets. Cross-modal correlation between images and audios can help identify images (or audios) of certain semantics. However, the heterogeneous problem makes it difficult to learn cross-modal correlation between visual and auditory features. In this paper, we analyze canonical correlation between feature matrices of images and audios during subspace mapping; then we design correlation-based similarity reinforcement for images and audios; thirdly we implement image clustering and audio clustering with affinity propagation. Experiment results on image-audio dataset are encouraging and show that the performance of our approach is effective. We give an interesting application of querying images by audio examples.

0 Replies