Abstract: It is interesting and challenging to explore correlations between different datasets and utilize such correlations for the clustering on these datasets. Cross-modal correlation between images and audios can help identify images (or audios) of certain semantics. However, the heterogeneous problem makes it difficult to learn cross-modal correlation between visual and auditory features. In this paper, we analyze canonical correlation between feature matrices of images and audios during subspace mapping; then we design correlation-based similarity reinforcement for images and audios; thirdly we implement image clustering and audio clustering with affinity propagation. Experiment results on image-audio dataset are encouraging and show that the performance of our approach is effective. We give an interesting application of querying images by audio examples.
0 Replies
Loading