ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains

ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains

ICLR 2026 Conference Submission4890 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: instance-level image retrieval, image re-ranking, local similarity, generalization, interpretability

TL;DR: a new model and an extensive evaluatiion benchmark for domain generalization of instance-level image retrieval re-ranking using local descriptors

Abstract: Large-scale instance-level training data is scarce, so models are typically trained on domain-specific datasets. Yet in real-world retrieval, they must handle diverse domains, making generalization to unseen data critical. We introduce ELViS, an image-to-image similarity model that generalizes effectively to unseen domains. Unlike conventional approaches, our model operates in similarity space rather than representation space, promoting cross-domain transfer. It leverages local descriptor correspondences, refines their similarities through an optimal transport step with data-dependent gains that suppress uninformative descriptors, and aggregates strong correspondences via a voting process into an image-level similarity. This design injects strong inductive biases, yielding a simple, efficient, and interpretable model. To assess generalization, we compile a benchmark of eight datasets spanning landmarks, artworks, products, and multi-domain collections, and evaluate ELViS as a re-ranking method. Our experiments show that ELViS outperforms competing methods by a large margin in out-of-domain scenarios and on average, while requiring only a fraction of their computational cost.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 4890

Loading