Semantic Granularity Metric Learning for Visual Search

Dipu Manandhar, Muhammet Bastan, Kim-Hui Yap

16 Feb 2020OpenReview Archive Direct UploadReaders: Everyone

Abstract: Deep metric learning applied to various multimedia applications has shown promising results in tasks such identification, retrieval and recognition. Existing metric learning methods often do not consider different granularity in visual similarity. However, in many domain applications, images exhibit similarity at multiple granularities with visual semantic concepts, e.g. fashion demonstrates similarity ranging from clothing of the exact same instance to similar looks/design or a common category. Therefore, training image triplets/pairs used for metric learning inherently possess different degree of information. However, the existing methods often treats them with equal importance during training. This hinders capturing the underlying granularities in feature similarity which is required for effective visual search. In view of this, we propose a new deep semantic granularity metric learning (SGML) that develops a novel idea of detecting and leveraging attribute semantic space to capture different granularity of similarity, and then integrate this information into deep metric learning. The proposed framework simultaneously learns image attributes and embeddings using multitask CNNs with shared parameters. The two tasks are not only jointly optimized but are further linked by the semantic granularity similarity mappings to leverage the correlations between the tasks. To this end, we propose a new soft-binomial deviance loss that effectively integrates the degree of information in training samples, which helps to capture visual similarity at multiple granularities. Compared to recent ensemble-based methods, our framework is conceptually elegant, computationally simple and provides better performance. We perform extensive experiments on benchmark metric learning datasets and demonstrate that our method outperforms recent state-of-the-art methods, e.g. , 1-4.5% improvement in Recall@1 over the previous state-of-the-arts [1], [2] on DeepFashion In-Shop dataset

0 Replies