Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods

Published: 01 Jan 2024, Last Modified: 10 Oct 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Retrieval tasks play central roles in real-world machine learning systems such as search engine, recommender sys-tem, and retrieval-augmented generation (RAG). Achieving decent performance in these tasks often requires fine-tuning various pretrained models on specific datasets and selecting the best candidate, a process that can be both time and resource consuming. To tackle the problem, we introduce a novel and efficient method, called RetMMD, that leverages Maximum Mean Discrepancy (MMD) and kernel methods to assess the transferability of pretrained models in retrieval tasks. RetMMD is calculated on pretrained model and tar-get dataset without any fine-tuning involved. Specifically, given some query, we quantify the distribution discrepancy between relevant and irrelevant document embeddings, by estimating the similarities within their mappings in the fine-tuned embedding space through kernel method. This discrep-ancy is averaged over multiple queries, taking into account the distribution characteristics of the target dataset. Ex-periments suggest that the proposed metric calculated on pretrained models closely aligns with retrieval performance post fine-tuning. The observation holds across a variety of datasets, including image, text, and multi-modal domains, indicating the potential of using MMD and kernel meth-ods for transfer learning evaluation in retrieval scenarios. In addition, we also design a way of evaluating dataset transferability for retrieval tasks, with experimental results demonstrating the effectiveness of the proposed approach.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview