Tackling the Retrieval Trilemma with Cross-Modal Indexing

Heng Zhang; Daqing Liu; Heliang Zheng; Chaoyue Wang; Bing Su

Tackling the Retrieval Trilemma with Cross-Modal Indexing

Heng Zhang, Daqing Liu, Heliang Zheng, Chaoyue Wang, Bing Su

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: cross-modal retrieval, retrieval trilemma, cross-modal indexing

TL;DR: We propose a novel paradigm Cross-Modal Indexing that directly maps the query into identifiers of relevant candidates to achieve high accuracy, fast speed, and low storage simultaneously.

Abstract: Current cross-modal retrieval methods still struggle with the retrieval trilemma to simultaneously satisfy three key requirements, including high accuracy, fast speed, and low storage. For example, the cross-modal embedding methods usually suffer from either slow query speed caused by the time-consuming modality interaction or the tremendous memory cost of dense vector storage. While the cross-modal hashing methods are typically unsatisfied in accuracy due to the lossy discrete quantization for vector compression. In this paper, we tackle the retrieval trilemma with a new paradigm named Cross-Modal Indexing (CMI) that directly maps queries into identifiers of the final retrieved candidates. Specifically, we firstly pre-define sequential identifiers (SIDs) for all candidates into a hierarchical tree that maintains data semantically structures. Then we train an encoder-decoder network that maps queries into SIDs with the supervision of the constructed SIDs. Finally, we directly sample SIDs of relevant candidates for queries with O(1) time complexity. By evading the unfavorable modality interaction, dense vector storage, and vector compression, the proposed CMI reaches a satisfactory balance in the retrieval trilemma. For example, experiments demonstrate that CMI achieves comparable accuracy with about 1000x storage reduction and 120x speedup compared to the state-of-the-art methods on several popular image-text retrieval benchmarks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

5 Replies

Loading