Classifier Risk Estimation Under Limited Labeling Resources.

Anurag Kumar, Bhiksha Raj

2018 (modified: 09 Nov 2022)PAKDD (1)2018Readers: Everyone

Abstract: Evaluating a trained system is an important component of machine learning. Labeling test data for large scale evaluation of a trained model can be extremely time consuming and expensive. In this paper we propose strategies for estimating performance of a classifier using as little labeling resource as possible. Specifically, we assume a labeling budget is given and the goal is to get a good estimate of the classifier performance using the provided labeling budget. We propose strategies to get a precise estimate of classifier accuracy under this restricted labeling budget scenario. We show that these strategies can reduce the variance in estimation of classifier accuracy by a significant amount compared to simple random sampling (over $$\mathbf {65\%}$$ in several cases). In terms of labeling resource, the reduction in number of samples required (compared to random sampling) to estimate the classifier accuracy with only $$1\%$$ error is high as $$\mathbf {60\%}$$ in some cases.

0 Replies