Good Classification Measures and How to Find Them

Martijn Gösgens; Anton Zhiyanov; Alexey Tikhonov; Liudmila Prokhorenkova

Good Classification Measures and How to Find Them

Martijn Gösgens, Anton Zhiyanov, Alexey Tikhonov, Liudmila Prokhorenkova

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: classification, classification evaluation, performance measures, symmetric balanced accuracy, correlation coefficient

TL;DR: Provide a systematic theoretical analysis of classification performance measures: define a list of desirable properties and check them for a number of known and novel measures.

Abstract: Several performance measures can be used for evaluating classification results: accuracy, F-measure, and many others. Can we say that some of them are better than others, or, ideally, choose one measure that is best in all situations? To answer this question, we conduct a systematic analysis of classification performance measures: we formally define a list of desirable properties and theoretically analyze which measures satisfy which properties. We also prove an impossibility theorem: some desirable properties cannot be simultaneously satisfied. Finally, we propose a new family of measures satisfying all desirable properties except one. This family includes the Matthews Correlation Coefficient and a so-called Symmetric Balanced Accuracy that was not previously used in classification literature. We believe that our systematic approach gives an important tool to practitioners for adequately evaluating classification results.

Supplementary Material: pdf

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Code: https://github.com/yandex-research/classification-measures

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/good-classification-measures-and-how-to-find/code)

10 Replies

Loading