Keywords: Content-Based Image Retrieval, Computer Vision, Disentangled Representation Learning, Generative Models, Computer Vision
TL;DR: In this work we evaluate the use of weakly-supervised disentangled representations we show that content based image retrieval with respect to some semantic concepts can be improved with the use of this kind of supervision.
Abstract: In content-based image retrieval (CBIR), a database of images is ordered based on the similarity to a query image. Similarity criteria is usually determined with respect to a shared category e.g. whether the database images contain an object of the same type as depicted in the query. Depending on the situation, multiple similarity criteria can be relevant such as the type of object, its color, or the depicted background. Ideally, a dataset labeled with all possible criteria information is available for training a model for computing the similarity. Typically, this is not the case. In this paper, we explore the use of disentangled representations for CBIR with respect to multiple criteria. To alleviate the need for labels, the models used to create the representations are learned via weak supervision by using data organized into groups with shared information. We show that such models can attain better retrieval performances compared to unsupervised baselines.