Model Selection of Discrete Classifiers under Class and Cost Distribution Change: An Empirical Study
Abstract: A variety of important machine learning applications require predictions on test data with
different characteristics than the data on which a model was trained and validated. In
particular, test data may have a different relative frequency of positives and negatives (i.e.,
class distribution) and/or different mislabeling costs of false positive and false negative
errors (i.e., cost distribution) than the training data. Selecting models that have been
built in conditions that are substantially different from the conditions under which they
will be applied is more challenging than selecting models for identical conditions. Several
approaches to this problem exist, but they have mostly been studied in theoretical contexts.
This paper presents an empirical evaluation of approaches for model selection under class and
cost distribution change, based on Receiver Operating Characteristic (ROC) analysis. The
analysis compares the ROC Convex Hull (ROCCH) method with other candidate approaches
for selecting discrete classifiers for several UCI Machine Learning Repository and simulated
datasets. Surprisingly, the ROCCH method did not perform well in the experiments, despite
being developed for this task. Instead, the results indicate that a reliable approach for
selecting discrete classifiers on the ROC convex hull is to select the model that optimizes
the cost metric of interest on the validation data (which has the same characteristics as the
training data) but weighted by the class and/or cost distributions anticipated at test time.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have revised the manuscript to improve readability, and added more background information in places. We also introduced discussions on topics pointed out by the reviewers.
1. Section 1: We have deemphasized what was previously the second bullet describing our contributions, and have emphasized the need for caution when using the ROCCH method in empirical settings.
2. Section 2: We stated assumptions more explicitly and added more mathematical notation to make things easier to follow. More background information on the ROC convex hull and Step 4 of ROCCH method was added.
3. Section 3: Table 1 is now referenced early on in the section.
4. Section 4: Discussion on tie breaking was reorganized and more information was added.
5. Section 6: We added more details about the nuance of our findings. Assumptions and notation for covariate shift were added.
6. Section 7: We introduced a paragraph on causality, domain adaptation, counterfactual estimation to the Related Work section.
7. Section 8: We introduced a paragraph on the mismatch between expected and actual testing conditions to the Future Work section.
A paragraph in the Related Works was mistakenly repeated. We have corrected that.
Assigned Action Editor: ~Abhishek_Kumar1
Submission Number: 1413
Loading