Robust Contrastive Cross-modal Hashing with Noisy Labels

Published: 20 Jul 2024, Last Modified: 01 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Cross-modal hashing has emerged as a promising technique for retrieving relevant information across distinct media types thanks to its low storage cost and high retrieval efficiency. However, the success of most existing methods heavily relies on large-scale well-annotated datasets, which are costly and scarce in the real world due to ubiquitous labeling noise. To tackle this problem, in this paper, we propose a novel framework, termed Noise Resistance Cross-modal Hashing (NRCH), to learn hashing with noisy labels by overcoming two key challenges, i.e. noise overfitting and error accumulation. Specifically, i) to mitigate the overfitting issue caused by noisy labels, we present a novel Robust Contrastive Hashing loss (RCH) to target homologous pairs instead of noisy positive pairs, thus avoiding overemphasizing noise. In other words, RCH enforces the model focus on more reliable positives instead of unreliable ones constructed by noisy labels, thereby enhancing the robustness of the model against noise; ii) to circumvent error accumulation, a Dynamic Noise Separator (DNS) is proposed to dynamically and accurately separate the clean and noisy samples by adaptively fitting the loss distribution, thus alleviate the adverse influence of noise on iterative training. Finally, we conduct extensive experiments on four widely used benchmarks to demonstrate the robustness of our NRCH against noisy labels for cross-modal retrieval. The code is available at: https://github.com/LonganWANG-cs/NRCH.git.
Primary Subject Area: [Engagement] Multimedia Search and Recommendation
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This work contributes to multimedia/multimodal processing in several significant ways: 1. Robustness Against Noisy Labels: By presenting the Robust Contrastive Hashing (RCH) loss, the approach directly targets the common issue of noisy labels in large datasets. This is particularly relevant for multimedia processing where data from various sources can be inconsistent or incorrect. 2. Resource Efficiency: Cross-modal hashing is known for its low storage cost and high retrieval speed. The proposed methods likely enhance these aspects by reducing the dependency on large-scale, well-annotated datasets, which are expensive and difficult to procure. 3. Benchmarking: Conducting extensive experiments on widely recognized benchmarks demonstrates the practical applicability of the proposed methods and provides a comparative analysis with existing techniques, showcasing the improvements in handling noisy data. Overall, this research addresses critical challenges in cross-modal hashing and offers solutions that could be widely adopted for more efficient and robust multimedia/multimodal processing systems.
Supplementary Material: zip
Submission Number: 103
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview