Abstract: Highlights • A hierarchical learning architecture is proposed to deal with the weakly-paired heterogeneous multi-modal data. • A shared classifier across different modalities in the learned representation space is imposed to utilize the label information. • A low-rank model is introduced to characterize the within-class similarity in the learned representation space. Abstract Many multi-modal data suffers from significant weak-pairing characteristics, i.e., there is no sample-to-sample correspondence between modalities, rather classes of samples in one modality correspond to classes of samples in the other modality. This provides great challenges for the cross-modal learning for retrieval. In this work, our focus is learning cross-modal representations with minimal class label supervision and without correspondences between samples. To tackle this challenging problem, we establish a scalable hierarchical learning architecture to deal with the extensive weakly-paired heterogeneous multi-modal data. A shared classifier across different modalities is used to effectively deal with label supervision information, and a multi-modal low-rank model is introduced to encourage the modal-invariant representation. Finally, some cross-modal validations on publicly available datasets are performed to show the advantages of the proposed method.
0 Replies
Loading