GADNet: Improving image-text matching via graph-based aggregation and disentanglement

Published: 01 Jan 2025, Last Modified: 05 Mar 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A graph-based framework is proposed for cross-modal aggregation and disentanglement.•Multi-granularity semantic consistency learning measures original vs. disentangled representations.•Extensive experiments on Flickr30K and MS-COCO datasets demonstrate our method’s superiority.
Loading