Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ

David A. Cieslak, Nitesh V. Chawla

Published: 2008, Last Modified: 20 Jul 2025PAKDD 2008EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Many machine learning applications like finance, medicine, and risk management suffer from class imbalance: cases of interest occur rarely. Further complicating these applications is that the training and testing samples might differ significantly in their respective class distributions. Sampling has been shown to be a strong solution to imbalance and additionally offers a rich parameter space from which to select classifiers. This paper is concerned with the interaction between Probability Estimation Trees (PETs) [1], sampling, and performance metrics as testing distributions fluctuate substantially. A set of comprehensive analyses is presented, which anticipate classifier performance through a set of widely varying testing distributions.