Abstract: Highlights•Proposes a global-local fusion for image-text matching.•Established a global similarity matching module.•Flexible measurement of matching results through dynamic fusion.•Proposed a training mechanism based on adversarial sample generation.•Adjusting the proportion of global-local modules by loss adjustment.
Loading