Abstract: Highlights•We introduce text semantics into both inter-modality matching and learning.•We match inter-modality positive clusters based on dual semantics.•Text semantic consistency loss is introduced for modality-invariant learning.
External IDs:dblp:journals/ivc/GuoP25
Loading