Abstract: Threshold-moving is one of the several techniques employed in correcting the bias of binary classifiers towards the majority class. In this approach, the decision threshold is adjusted to detect the minority class at the cost of increased misclassification of the majority. In practice, selecting a good threshold using cross-validation on the training data is not feasible in some problems since there are only a few minority samples. In this study, building a meta-learner for threshold prediction to tackle the threshold estimation problem in the case of rare positive samples is addressed. Novel meta-features are suggested to quantify the imbalance characteristics of the data sets and the patterns among the prediction scores. A random forest-based threshold prediction model is constructed using these meta-features extracted from the score space of external data. The models obtained are then employed to estimate the optimal thresholds for previously unseen datasets. The random forest-based meta-learner that employs implicitly selected subset of the proposed meta-features and encodes information from multiple external sources in the form of different trees is evaluated by using 52 imbalanced datasets. In the first set of experiments, the best-fitting thresholds are computed for SVM and logistic regression classifiers that are trained using the original imbalanced training sets. The experiments are repeated by using ensembles of multiple learners, each trained using a different balanced data set. It is observed that the proposed approach provides better F-score when compared to alternative threshold-moving and balancing techniques.
Loading