Classifiers Should Do Well Even on Their Worst ClassesDownload PDF

Published: 12 Jul 2022, Last Modified: 05 May 2023Shift Happens 2022 PosterReaders: Everyone
Abstract: The performance of a vision classifier on a given test set is usually measured by its accuracy. For reliable machine learning systems, however, it is important to avoid the existence of areas of the input space where they fail severely. To reflect this, we argue, that a single number does not provide a complete enough picture even for a fixed test set, as there might be particular classes or subtasks where a model that is generally accurate performs unexpectedly poorly. Without using new data, we motivate and establish a wide selection of interesting worst-case performance metrics which can be evaluated besides accuracy on a given test set. Some of these metrics can be extended when a grouping of the original classes into superclasses is available, indicating if the model is exceptionally bad at handling inputs from one superclass.
Submission Type: Full submission (technical report + code/data)
Co Submission: No I am not submitting to the dataset and benchmark track and will complete my submission by June 3.
0 Replies

Loading