Multi-view Multi-label Canonical Correlation Analysis for Cross-modal Matching and Retrieval

Rushil Sanghavi, Yashaswi Verma

2022 (modified: 18 Nov 2022)CVPR Workshops 2022Readers: Everyone

Abstract: In this paper, we address the problem of cross-modal retrieval in presence of multi-view and multi-label data. For this, we present Multi-view Multi-label Canonical Correlation Analysis (or MVMLCCA), which is a generalization of CCA for multi-view data that also makes use of high-level semantic information available in the form of multi-label annotations in each view. While CCA relies on explicit pairings/associations of samples between two views (or modalities), MVMLCCA uses the available multi-label annotations to establish correspondence across multiple (two or more) views without the need of explicit pairing of multi-view samples. Extensive experiments on two multi-modal datasets demonstrate that the proposed approach offers much more flexibility than the related approaches without compromising on scalability and cross-modal retrieval performance. Our code and precomputed features are available at https://github.com/Rushil231100/MVMLCCA.

0 Replies