Compact-Yet-Separate: Proto-Centric Multi-Modal Hashing With Pronounced Category Differences for Multi-Modal Retrieval

Published: 2025, Last Modified: 05 Jan 2026IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-modal hashing achieves low storage costs and high retrieval speeds by using compact hash codes to represent complex and heterogeneous multi-modal data, effectively addressing the inefficiency and resource intensiveness challenges faced by the traditional multi-modal retrieval methods. However, balancing intraclass compactness and interclass separability remains a struggle in existing works due to coarse-grained feature limitations, simplified fusion strategies that overlook semantic complementarity, and neglect of the structural information within the multi-modal data. To address these limitations comprehensively, we propose a Proto-centric Multi-modal Hashing with Pronounced Category Differences (PMH-PCD) model. Specifically, PMH-PCD first learns modality-specific prototypes by deeply exploring within-modality class information, ensuring effective fusion of each modality's unique characteristics. Furthermore, it learns multi-modal integrated class prototypes that seamlessly incorporate semantic information across modalities to effectively capture and represent the intricate relationships and complementary semantic content embedded within the multi-modal data. Additionally, to generate more discriminative and representative binary hash codes, PMH-PCD integrates multifaceted semantic information, encompassing both low-level pairwise relations and high-level structural patterns, holistically capturing intricate data details and leveraging underlying structures. The experimental results demonstrate that, compared with existing advanced methods, PMH-PCD achieves superior and consistent performances in multi-modal retrieval tasks.
Loading