Point to Rectangle Matching for Image Text RetrievalOpen Website

2022 (modified: 18 Nov 2022)ACM Multimedia 2022Readers: Everyone
Abstract: The difficulty of image-text retrieval is further exacerbated by the phenomenon of one-to-many correspondence, where multiple semantic manifestations of the other modality could be obtained by a given query. However, the prevailing methods adopt the deterministic embedding strategy to retrieve the most similar candidate, which encodes the representations of different modalities as single points in vector space. We argue that such a deterministic point mapping is obviously insufficient to represent a potential set of retrieval results for one-to-many correspondence, despite its noticeable progress. As a remedy to this issue, we propose a Point to Rectangle Matching (abbreviated as P2RM) mechanism, which actually is a geometric representation learning method for image-text retrieval. Specifically, our intuitive insight is that the representations of different modalities could be extended to rectangles, then a set of points inside such a rectangle embedding could be semantically related to many candidate correspondences. Thus our P2RM method could essentially address the one-to-many correspondence. Besides, we design a novel semantic similarity measurement method from the perspective of distance for our rectangle embedding. Under the evaluation metric for multiple matches, extensive experiments and ablation studies on two commonly used benchmarks demonstrate our effectiveness and superiority in tackling the multiplicity of image-text retrieval.
0 Replies

Loading