Relevance-Based Embeddings for Efficient Relevance Retrieval

Kirill Sergeevich Shevkunov; Andrey Ploskonosov; Liudmila Prokhorenkova

Relevance-Based Embeddings for Efficient Relevance Retrieval

Kirill Sergeevich Shevkunov, Andrey Ploskonosov, Liudmila Prokhorenkova

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Information search, Relevance search, Nearest neighbor search, Relevance-based embeddings, Recommendation systems

TL;DR: The idea is to describe each query (item) by its relevance for a set of support items (queries) and use these new representations to obtain query (item) embeddings.

Abstract: In many machine learning applications, the most relevant items for a particular query should be efficiently extracted. The relevance function is usually an expensive similarity model making the exhaustive search infeasible. A typical solution to this problem is to train another model that separately embeds queries and items to a vector space, where similarity is defined via the dot product or cosine similarity. This allows one to search the most relevant objects through fast approximate nearest neighbors search at the cost of some reduction in quality. To compensate for this reduction, the found candidates are re-ranked by the expensive similarity model. In this paper, we investigate an alternative approach that utilizes the relevances of the expensive model to make relevance-based embeddings (RBE). The idea is to describe each query (item) by its relevance for a set of support items (queries) and use these new representations to obtain query (item) embeddings. We theoretically prove that relevance-based embeddings are powerful enough to approximate any complex similarity model (under mild conditions). An important ingredient of RBE is the choice of support items. We investigate several strategies and demonstrate that significant improvements can be obtained compared to random choice. Our experiments on diverse datasets illustrate the power of relevance-based embeddings.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7889

Loading