Severe Class Imbalance: Why Better Algorithms Aren't the Answer

Published: 01 Jan 2005, Last Modified: 25 Jan 2025ECML 2005EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper argues that severe class imbalance is not just an interesting technical challenge that improved learning algorithms will address, it is much more serious. To be useful, a classifier must appreciably outperform a trivial solution, such as choosing the majority class. Any application that is inherently noisy limits the error rate, and cost, that is achievable. When data are normally distributed, even a Bayes optimal classifier has a vanishingly small reduction in the majority classifier’s error rate, and cost, as imbalance increases. For fat tailed distributions, and when practical classifiers are used, often no reduction is achieved.
Loading