The SMOTE Paradox: Why a 92% Baseline Collapsed to 6%—A Systematic Review of 821 Papers in Imbalanced Learning (2020–2025)

TMLR Paper6827 Authors

06 Jan 2026 (modified: 17 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Class imbalance pervades production systems—fraud detection, medical diagnosis, industrial monitoring—yet handling it effectively remains challenging. For two decades, SMOTE has been the default solution, but practitioners increasingly abandon it at scale. We investigate this disconnect through systematic review of 821 DBLP papers (2020–2025) and bibliometric analysis of 4,985 Scopus records. Our analysis reveals the SMOTE Paradox: only 6% of high-impact papers successfully executed SMOTE at full scale due to memory exhaustion or preprocessing bottlenecks. The field has fragmented, with 30% adopting generative models, 30% using cost-sensitive losses, and 40% employing hybrid approaches. Three factors explain SMOTE's decline. First, $O(N \cdot N_{\text{min}} \cdot d)$ nearest-neighbor search requires 1.28 TB memory for typical modern datasets. Second, linear interpolation produces off-manifold artifacts scaling as $\sqrt{d}$ in high dimensions. Third, CPU-bound preprocessing creates friction with GPU-centric training pipelines. We validate these findings through controlled experiments across seven datasets (196 trials, imbalance ratios 1.1:1 to 129:1). Statistical testing reveals no significant ROC-AUC differences between SMOTE and cost-sensitive baselines (Friedman $p=0.907$), despite SMOTE incurring 2.7× computational overhead. However, cost-sensitive methods severely degrade at extreme imbalance (>40:1).
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Ju_Sun1
Submission Number: 6827
Loading