Calibrating Probabilistic Embeddings for Cross-Modal RetrievalDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Abstract: The core of cross-modal retrieval is to measure the content similarity between data of different modalities. The main challenge focuses on learning a shared representation space for multiple modalities where the similarity measurement can reflect the semantic closeness. The multiplicity of correspondences further escalates the challenge since all the possible matches should be ranked ahead of the negatives. Probabilistic embeddings are proposed to handle the multiplicity while suffering from similarity miscalibration. To address it, we propose to calibrate the similarity for probabilistic embeddings. The key idea is to estimate the density ratio between the distributions of the two modalities, and use it to calibrate the similarity measurement in the embedding space. To the best of our knowledge, we are the first to study the miscalibration in probabilistic embeddings. In addition, we further evaluate three pre-training tasks of language models, which is important for cross-modal but seldom investigated in previous studies. Extensive experiments as well as ablation studies on two benchmarks demonstrate its superior performance in tackling the multiplicity of cross-modal retrieval.
4 Replies

Loading