Learning Imbalanced Data with Beneficial Label Noise

Guangzheng Hu; Feng Liu; Mingming Gong; Guanghui Wang; Liuhua Peng

Learning Imbalanced Data with Beneficial Label Noise

Guangzheng Hu, Feng Liu, Mingming Gong, Guanghui Wang, Liuhua Peng

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Abstract: Data imbalance is a common factor hindering classifier performance. Data-level approaches for imbalanced learning, such as resampling, often lead to information loss or generative errors. Building on theoretical studies of imbalance ratio in binary classification, it is found that adding suitable label noise can adjust biased decision boundaries and improve classifier performance. This paper proposes the Label-Noise-based Re-balancing (LNR) approach to solve imbalanced learning by employing a novel design of an asymmetric label noise model. In contrast to other data-level methods, LNR alleviates the issues of informative loss and generative errors and can be integrated seamlessly with any classifier or algorithm-level method. We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise.

Lay Summary: Machine learning struggles when one category (like fraudulent transactions) is vastly outnumbered by another (like normal transactions). Traditional fixes—deleting common examples or creating fake rare ones—often lose critical information or produce unrealistic data. We propose LNR, a simple but effective solution: we intentionally mislabel a small number of common examples as rare, labeling some suspicious "normal transactions" as "fraudulent" to stop the model from ignoring genuine fraud patterns. Unlike other methods, LNR preserves all original features, avoids information loss, and unrealistic samples. Tests across binary tabular data classification and multi-class image recognition tasks show LNR consistently improves rare-class recognition. Surprisingly, it proves that not all "label errors" are harmful—when applied strategically, they can enhance fairness. LNR's plug-and-play design makes it universally applicable to imbalance challenges in healthcare, finance, computer vision, and more, offering an easier way to help machine learning models see the "unseen."

Link To Code: https://github.com/guangzhengh/LNR.git

Primary Area: General Machine Learning->Supervised Learning

Keywords: Imbalanced learning, beneficial label noise, classification accuracy, decision boundary

Submission Number: 9268

Loading