Learning Curves of Classification Metrics based on  Confusion Matrices

Yan Xue; Ruibo Wang; Xuefei Cao; Jing Yang; Jihong Li

Learning Curves of Classification Metrics based on Confusion Matrices

Yan Xue, Ruibo Wang, Xuefei Cao, Jing Yang, Jihong Li

Published: 01 Sept 2025, Last Modified: 18 Nov 2025ACML 2025 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Learning curves of classification metrics, including test error, precision (P), recall (R), F$_1$ score, with regard to training set sizes are a recent hot topic in developing an advanced methodology of model selection and hyperparameter optimization. The existing studies concentrated on formulating the functional shapes of the well-behaved learning curves of test error by using a normality assumption. However, the normality assumption is unreasonable for learning curves of classification metrics because the distributions of most classification metrics, such as P, R, and F$_1$ score, are skewed, and interval estimations of the metrics based on the normality assumption may exceed [0,1]. In this study, considering most classification metrics are obtained from confusion matrices, we develop a novel method to formulate the learning curves of classification metrics by considering that the four entries in a confusion matrix jointly follow a multi-nomial distribution rather than a normality distribution. Furthermore, the function of each entry in a confusion matrix with regard to training set sizes is formulated with an exponential form. Thus, the learning curves of a classification metric can be naturally obtained by transforming the functions of a confusion matrix in terms of the definition of the metric. Moreover, reasonable confidence bands of several popular metrics, including test error, P, R, and F$_1$ score, are derived in this study based on the assumption of the multi-nomial distribution of a confusion matrix. Extensive experiments are conducted on several synthetic and real-world data sets coupled with multiple typical non-neural and neural classification algorithms. Experimental results illustrate the improvements of the proposed learning curves of test error, P, R, and F$_1$ score and the superiority of the confidence bands.

Supplementary Material: pdf

Submission Number: 295

Loading