LRVS-Fashion: Extending Visual Search with Referring Instructions

Simon Lepage; Jeremie Mary; David Picard

LRVS-Fashion: Extending Visual Search with Referring Instructions

Simon Lepage, Jeremie Mary, David Picard

13 May 2024 (modified: 13 Nov 2024)Submitted to NeurIPS 2024 Track Datasets and BenchmarksEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Visual Search, Image Embedding, Retrieval

TL;DR: This paper introduces a large dataset of paired fashion images dedicated to Referred Visual Search, a conditional embedding task.

Abstract: This paper introduces a new challenge for image similarity search in the context of fashion, addressing the inherent ambiguity in this domain stemming from complex images. We present Referred Visual Search (RVS), a task allowing users to define more precisely the desired similarity, following recent interest in the industry. We release a new large public dataset, LRVS-Fashion, consisting of 272k fashion products with 842k images extracted from fashion catalogs, designed explicitly for this task. However, unlike traditional visual search methods in the industry, we demonstrate that superior performance can be achieved by bypassing explicit object detection and adopting weakly-supervised conditional contrastive learning on image tuples. Our method is lightweight and demonstrates robustness, reaching Recall at one superior to strong detection-based baselines against 2M distractors.

Supplementary Material: pdf

Flagged For Ethics Review: true

Submission Number: 248

Loading