The Met Dataset: Instance-level Recognition for ArtworksDownload PDF

Published: 11 Oct 2021, Last Modified: 22 Oct 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone
Keywords: instance-level recognition, artwork recognition, large-scale recognition dataset, knn classification
Abstract: This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhibit with photos taken under studio conditions. Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing. Testing is additionally performed on a set of images not related to Met exhibits making the task resemble an out-of-distribution detection problem. The proposed benchmark follows the paradigm of other recent datasets for instance level recognition on different domains to encourage research on domain independent approaches. A number of suitable approaches are evaluated to offer a testbed for future comparisons. Self-supervised and supervised contrastive learning are effectively combined to train the backbone which is used for non-parametric classification that is shown as a promising direction. Dataset webpage:
TL;DR: A large-scale dataset for instance-level recognition for artworks is introduced.
Supplementary Material: pdf
Contribution Process Agreement: Yes
Dataset Url:
License: Dataset license: The annotations are licensed under CC BY 4.0 license. The images included in the dataset are either publicly available on the web, and come from three sources, i.e. the Met open collection, Flickr, and WikiMedia commons, or are created by us. The corresponding licenses for the ones that are available on the web are public domain, Creative Commons, and public domain, respectively. We do not own their copyright. For the ones created by us, we release them to the public domain. Code license: The code that accompanies the dataset and is hosted at ( is licensed under the MIT License.
Author Statement: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](
13 Replies