Over the Top-1: Uncertainty-Aware Cross-Modal Retrieval with CLIP

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-modal, cross-modal retrieval, CLIP, predictive uncertainty
TL;DR: In this paper, we investigate different training-free methods for quantifying uncertainty in cross-modal retrieval tasks with CLIP.
Abstract: State-of-the-art vision-language models, such as CLIP, achieve remarkable performance in cross-modal retrieval tasks, yet estimating their predictive uncertainty remains an open challenge. While recent works have explored probabilistic embeddings to quantify retrieval uncertainty, these approaches often require model retraining or fine-tuning adapters, making them computationally expensive and dataset-dependent. In this work, we propose a training-free framework for uncertainty estimation in cross-modal retrieval. We start with a simple yet effective baseline that uses the cosine similarity between a query and its top-ranked match as an uncertainty measure. Building on this, we introduce a method that estimates uncertainty by analyzing the consistency of the top-1 retrieved item across samples drawn from the posterior predictive distribution using Monte Carlo Dropout (MCD) or Deep Ensembles. Finally, we propose an adversarial perturbation-based method, where the minimal perturbation required to alter the top-1 retrieval serves as a robust indicator of uncertainty. Empirical results in two standard cross-modal retrieval benchmarks demonstrate that our approach achieves superior calibration compared to learned probabilistic methods, all while incurring zero additional training cost.
Latex Source Code: zip
Code Link: http://github.com/lluisgomez/uCLIP
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission5/Authors, auai.org/UAI/2025/Conference/Submission5/Reproducibility_Reviewers
Submission Number: 5
Loading