Abstract: Cross-modal hashing encodes different modalities of multi-modal data into a low-dimensional Hamming space for fast cross-modal retrieval. Most existing cross-modal hashing methods heavily rely on label semantics to boost retrieval performance; however, semantics are expensive to collect in real applications. To mitigate the heavy reliance on semantics, this work proposes a new semi-supervised deep cross-modal hashing method, namely, Graph Convolutional Semi-Supervised Cross-Modal Hashing (GCSCH), which is trained with limited label supervision. The proposed GCSCH first generates pseudo-multi-labels of the unlabeled samples using the simple yet effective idea of consistency regularization and pseudo-labeling. GCSCH designs a fusion network that merges the two modalities and employs Graph Convolutional Network (GCN) to capture semantic information among ground-truth-labeled and pseudo-labeled multi-modal data. Using the idea of knowledge distillation, GCSCH employs a teacher-student learning scheme that can successfully transfer knowledge from the fusion module to the image and text hashing networks. Empirical studies on three multi-modal benchmark datasets demonstrate the superiority of the proposed GCSCH over state-of-the-art cross-modal hashing methods with limited label supervision.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Cross-modal hashing encodes different modalities of multi-modal data into low-dimensional Hamming space for fast cross-modal retrieval. This work proposes a new semi-supervised deep cross-modal hashing method, i.e., Graph Convolutional Semi-Supervised Cross-Modal Hashing (GCSCH) that is trained with limited label supervision. This work contributes to partially labeled multi-modal data processing, and can well support fast image-text cross-modal retrieval.
Submission Number: 1691
Loading