Image-text Retrieval A Survey on Recent Research and DevelopmentDownload PDF

10 Nov 2022OpenReview Archive Direct UploadReaders: Everyone
Abstract: In the past few years, cross-modal image-text re- trieval (ITR) has experienced increased interest in the research community due to its excellent re- search value and broad real-world application. It is designed for the scenarios where the queries are from one modality and the retrieval galleries from another modality. This paper presents a com- prehensive and up-to-date survey on the ITR ap- proaches from four perspectives. By dissecting an ITR system into two processes: feature extraction and feature alignment, we summarize the recent ad- vance of the ITR approaches from these two per- spectives. On top of this, the efficiency-focused study on the ITR system is introduced as the third perspective. To keep pace with the times, we also provide a pioneering overview of the cross-modal pre-training ITR approaches as the fourth perspec- tive. Finally, we outline the common benchmark datasets and evaluation metric for ITR, and conduct the accuracy comparison among the representative ITR approaches. Some critical yet less studied is- sues are discussed at the end of the paper.
0 Replies

Loading