Dual-Mix for Cross-Modal Retrieval with Noisy Labels

Feng Ding, Xiu Liu, Xinyi Wang, Fangming Zhong

Published: 2024, Last Modified: 21 Apr 2026ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cross-modal retrieval with deep neural networks heavily relies on accurate annotation. However, existing methods may easily suffer from the scarcity and validity of annotations due to the expensive cost of manual labeling. In addition, it is inevitable that noisy labels are imposed during labeling. To this end, it is worthwhile to explore the potential of noisy labels in cross-modal retrieval. In this work, we propose a novel framework entitled Dual-Mix for Cross-Modal Retrieval with noisy labels (DMCM). It consists of two components, which are mixing the robust loss functions and mixing augmentation for noisy samples. In the first mixing stage, the normalized generalized cross entropy and mean absolute error are combined to boost each other. Then, after separating clean and noisy samples by Beta Mixture Model, we mix these samples via augmentation to further address the scarcity of labeled samples. Extensive experiments demonstrate the significant superiority of our DMCM.