Abstract: Cross-modal retrieval (CMR) has garnered significant attention due to its flexibility in querying across multiple modalities. However, the process of gathering well-annotated multi-modal data is both time-consuming and expensive. Although utilizing annotations from non-experts may alleviate some of the overhead, it inevitably introduces noisy labels. Existing methods often overlook the common scenario of multi-label data with noisy annotations. To tackle this challenge, we introduce a Deep noisy Multi-label Learning framework for Robust cross-modal retrieval (DMLR) to learn with noisy labels robustly. Specifically, one Cross-modal Label Refurbishment mechanism (CLR) is proposed in DMLR to generate soft labels for each instance by ensemble predictions to ensure the correct optimization direction, thereby mitigating the impact of noisy labels, and one Robust noisy Multi-label Learning mechanism (RML) is proposed to rapidly learn basic patterns and mitigate optimization risks stemming from noisy labels by an adaptive robust semantic-invariant loss while indenting the heterogeneity gap by a cross-modal gap loss, thereby obtaining discriminative and semantic-invariant representations. Finally, extensive experiments are conducted on three widely-used datasets to compare it with 12 state-of-the-art methods, demonstrating the effectiveness of our proposed method.
Loading