Pl@ntNet-300K: a new plant image dataset for the evaluation of set-valued classifiers

Camille Garcin; alexis joly; Pierre Bonnet; Antoine Affouard; Jean-Christophe Lombardo; Mathias Chouet; Maximilien Servajean; Joseph Salmon

Pl@ntNet-300K: a new plant image dataset for the evaluation of set-valued classifiers

Camille Garcin, alexis joly, Pierre Bonnet, Antoine Affouard, Jean-Christophe Lombardo, Mathias Chouet, Maximilien Servajean, Joseph Salmon

07 Jun 2021 (modified: 24 May 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: dataset, ambiguity, top-k, set-valued classification, long tail, plant

TL;DR: This paper presents a novel image dataset with high intrinsic ambiguity specifically built for evaluating and comparing set-valued classifers.

Abstract: This paper presents a novel image dataset with high intrinsic ambiguity specifically built for evaluating and comparing set-valued classifiers. This dataset, built from the database of Pl@ntnet citizen observatory, consists of 306,146 images covering 1,081 species. We highlight two particular features of the dataset, inherent to the way the images are acquired and to the intrinsic diversity of plants morphology: i) The dataset has a strong class imbalance, meaning that a few species account for most of the images. ii) Many species are visually similar, making identification difficult even for the expert eye. These two characteristics make the present dataset well suited for the evaluation of set-valued classification methods and algorithms. Therefore, we recommend two set-valued evaluation metrics associated with the dataset (top-k and average-k) and we provide the results of a baseline approach based on a resnet50 trained with the cross-entropy loss.

Supplementary Material: zip

URL: https://doi.org/10.5281/zenodo.4726653

10 Replies

Loading