Abstract: Image-text retrieval (ITR) is a fundamental and challenging task, aiming to retrieve semantically relevant images (texts) based on text (image) queries. It finds broad applications in search systems, online shopping, and social networks. Its primary challenge is measuring the similarity between the two modalities of vision and language. Most previous works have relied on extracting instance-level feature representations while overlooking the enhancement effect of external commonsense knowledge on these features. In this paper, we propose a method that leverages Knowledge Graphs and the Faiss library to improve the representation of the two modalities and retrieval efficiency (KGFC). First, we construct a Knowledge Graph Enhanced Embedding (KGEE) module. Following existing approaches, we build a knowledge graph to obtain node concept representations, filtered, weighted, and integrated to enrich instance-level features. Additionally, we utilize the Faiss library to create indices for the retrieval database and store them offline, significantly boosting retrieval efficiency. Extensive experiments on the MSCOCO benchmark demonstrate the superiority of our method in terms of retrieval accuracy.
External IDs:dblp:conf/webi/WangZS24
Loading