Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval
Abstract: Highlights•Proposes a multi-modal and multi-grained hierarchical semantic enhancement network.•Handles cross-modal retrieval by bridging the heterogeneity gap and granularity gap.•Obtains the primary and auxiliary similarity via two subnetworks.•Designs the multi-spring balance loss to adaptively optimize the similarity.
Loading