SHE: Streaming-media Hashing Retrieval

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper reveals and studies a practical but less-touched problem in cross-modal hashing, i.e., streaming-media hashing retrieval.
Abstract: Recently, numerous cross-modal hashing (CMH) methods have been proposed, yielding remarkable progress. As a static learning paradigm, existing CMH methods often implicitly assume that all modalities are prepared before processing. However, in practice applications (such as multi-modal medical diagnosis), it is very challenging to collect paired multi-modal data simultaneously. Specifically, they are collected chronologically, forming streaming-media data (SMA). To handle this, all previous CMH methods require retraining on data from all modalities, which inevitably limits the scalability and flexibility of the model. In this paper, we propose a novel CMH paradigm named Streaming-media Hashing rEtrieval (SHE) that enables parallel training of each modality. Specifically, we first propose a knowledge library mining module (KLM) that extracts a prototype knowledge library for each modality, thereby revealing the commonality distribution of the instances from each modality. Then, we propose a knowledge library transfer module (KLT) that updates and aligns the new knowledge by utilizing the historical knowledge library, ensuring semantic consistency. Finally, to enhance intra-class semantic relevance and inter-class semantic disparity, we develop a discriminative hashing learning module (DHL). Comprehensive experiments on four benchmark datasets demonstrate the superiority of our SHE compared to 14 competitors.
Lay Summary: Cross-modal hashing (CMH) techniques have achieved impressive progress in retrieving related content across different modalities such as images and text. However, most existing CMH methods assume that all modalities are available at once, which is unrealistic in many real-world scenarios, like medical diagnostics, where data from different modalities arrive sequentially. This creates a challenge for current CMH approaches, as they require full retraining whenever new data arrives, limiting scalability and flexibility. To address this, we propose a new CMH paradigm called Streaming-media Hashing rEtrieval (SHE), which supports asynchronous, parallel training for each modality. SHE introduces a Knowledge Library Mining (KLM) module to capture the semantic commonalities within each stream. A Knowledge Library Transfer (KLT) module then ensures semantic consistency by aligning newly arrived data with historical knowledge. To improve discriminative power, a Discriminative Hashing Learning (DHL) module enhances intra-class similarity and inter-class separation. This work provides a scalable, flexible solution for real-time multimodal retrieval, significantly advancing the applicability of CMH in dynamic, real-world settings.
Primary Area: General Machine Learning->Representation Learning
Keywords: cross-modal retrieval, cross-modal hashing, streaming-media
Submission Number: 5444
Loading