Not All Imbalance Is Random: Cluster-Balanced Ensembling for Missing-Not-At-Random Class Imbalance

Not All Imbalance Is Random: Cluster-Balanced Ensembling for Missing-Not-At-Random Class Imbalance

ICLR 2026 Conference Submission15848 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Class Imbalance, Missing-Not-At-Random, Ensemble Methods, Cluster-Based Undersampling

TL;DR: This paper shows that the nature of class imbalance is as critical as the correction method, and introduces a cluser-balanced ensembling approach to address Missing-Not-At-Random imbalance.

Abstract: Class imbalance methods inherently assume that observed minority instances are representative of their class and Missing At Random (MAR). However, in many real-world settings, minority instances are Missing Not At Random (MNAR), with observability shaped by both class and feature values. This leads to structurally biased samples, introducing a deeper challenge that goes beyond class-count imbalance. We show that when MNAR affects high-impact features, popular imbalance methods overfit the observed minority and fail to generalize. To address this, we propose a simple yet effective cluster-balanced ensemble approach that constructs diverse, near-balanced training sets by pairing all minority instances with different clusters of the majority class. Extensive experiments identify MNAR conditions under which our approach improves F1 scores over existing methods, and when it does not. We also introduce an evaluation protocol using representative balanced test sets, demonstrating that standard hold-out testing on MNAR data can mislead performance assessments. Our findings underscore that the cause of imbalance is as critical as the correction method.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 15848

Loading