Covariance Attention Guidance Mamba Hashing for cross-modal retrieval

Gang Wang, Shuli Cheng, Anyu Du, Qiang Zou

Published: 16 Apr 2025, Last Modified: 24 Apr 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: With the rapid development of artificial intelligence and multimedia technology, cross-modal hashing (CMH) has been widely applied in multimedia retrieval, recommendation systems, and large-scale data search due to its efficient query processing and low storage requirements, and has become a pivotal research area in both academia and industry. However, existing CMH algorithms fall short in exploiting the potential inter-modal correlations, leading to considerable semantic gaps. To overcome this issue, this paper proposes an innovative CMH framework called Covariance Attention Guidance Mamba Hashing (CAGMH) for Cross-Modal Retrieval. The framework enables deeper semantic alignment between modalities through a novel multi-feature fusion mechanism. This mechanism narrows the semantic gap and enhances the expressive power of each modality. Specifically, CAGMH exploits the distributional properties of covariance to optimize hash code generation and combined with the Mamba strategy to further improve cross-modal retrieval robustness. In addition, we design a novel loss function computation strategy that combines modal correlation with semantic consistency to optimize the model’s convergence and generalization ability. Experiments on four public benchmark datasets show that CAGMH surpasses state-of-the-art CMH methods, offering improved accuracy and efficiency in large-scale cross-modal similarity search. The corresponding code is available at https://github.com/Rooike111/CAGMH.