CLIMB: Class-imbalanced Learning Benchmark on Tabular Data

Zhining Liu; Zihao Li; Ze Yang; Tianxin Wei; Jian Kang; Yada Zhu; Hendrik Hamann; Jingrui He; Hanghang Tong

CLIMB: Class-imbalanced Learning Benchmark on Tabular Data

Zhining Liu, Zihao Li, Ze Yang, Tianxin Wei, Jian Kang, Yada Zhu, Hendrik Hamann, Jingrui He, Hanghang Tong

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Class Imbalance, Imbalanced Learning, Tabular Data

TL;DR: We introduce CLIMB, a comprehensive benchmark and empirical study of 29 class-imbalanced learning methods on 73 real-world tabular datasets, revealing key insights into method performance, efficiency, and robustness.

Abstract: Class-imbalanced learning (CIL) on tabular data is important in many real-world applications where the minority class holds the critical but rare outcomes. In this paper, we present CLIMB, a comprehensive benchmark for class-imbalanced learning on tabular data. CLIMB includes 73 real-world datasets across diverse domains and imbalance levels, along with unified implementations of 29 representative CIL algorithms. Built on a high-quality open-source Python package with unified API designs, detailed documentation, and rigorous code quality controls, CLIMB supports easy implementation and comparison between different CIL algorithms. Through extensive experiments, we provide practical insights on method accuracy and efficiency, highlighting the limitations of naive rebalancing, the effectiveness of ensembles, and the importance of data quality. Our code, documentation, and examples are available at https://github.com/ZhiningLiu1998/imbalanced-ensemble.

Code URL: https://github.com/ZhiningLiu1998/imbalanced-ensemble

Primary Area: Other (please use sparingly, only use the keyword field for more details)

Submission Number: 371

Loading