- TL;DR: Combining classification and image retrieval in a neural network architecture, we obtain an improvement for both tasks.
- Abstract: We introduce MultiGrain, a neural network architecture that generates compact image embedding vectors that solve multiple tasks of different granularity: class, instance, and copy recognition. MultiGrain is trained jointly for classification by optimizing the cross-entropy loss and for instance/copy recognition by optimizing a self-supervised ranking loss. The self-supervised loss only uses data augmentation and thus does not require additional labels. Remarkably, the unified embeddings are not only much more compact than using several specialized embeddings, but they also have the same or better accuracy. When fed to a linear classifier, MultiGrain using ResNet-50 achieves 79.4% top-1 accuracy on ImageNet, a +1.8% absolute improvement over the the current state-of-the-art AutoAugment method. The same embeddings perform on par with state-of-the-art instance retrieval with images of moderate resolution. An ablation study shows that our approach benefits from the self-supervision, the pooling method and the mini-batches with repeated augmentations of the same image.
- Keywords: classification, image retrieval, deep learning, data augmentation