Mutil-level Local Alignment and Semantic Matching Network for Image-Text Retrieval

Zhukai Jiang, Zhichao Lian

Published: 2022, Last Modified: 13 Nov 2023ICANN (3) 2022Readers: Everyone

Abstract: Image-text retrieval is a challenging task in the field of vision and language. The existing methods mainly compute the similarity of image-text pairs by the alignment between image regions and text words. Although these methods based on fine-grained local features achieve good results, these methods only explore the correspondence between salient objects and ignore the deep semantic information expressed by the whole image and text. Thus, we propose a novel multi-level local alignment and semantic matching network (MLASM) that introduces a multi-level semantic matching module after local alignment. This module supplies our model with more sufficient semantic information to understand the complex correlations between images and texts. Experiment results on two benchmark datasets Flickr30K and MS-COCO show that our MLASM achieves state-of-the-art performance.

0 Replies