Noisy Label Detection and Counterfactual Correction

Wenting Qi, Charalampos Chelmis

Published: 2024, Last Modified: 27 Jan 2026IEEE Trans. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Data quality is of paramount importance to the training of any machine learning model. Recently proposed approaches for noisy learning focus on detecting noisy labeled data instances by using a fixed loss value threshold and excluding detected noisy data instances in subsequent training steps. However, a predefined, fixed loss value threshold may not be optimal for detecting noisy labeled data, whereas excluding the detected noisy data instances can reduce the size of the training set to such an extent that accuracy can be negatively affected. In this article, we propose Noisy label Detection and Counterfactual Correction (NDCC), a new approach that automatically selects a loss value threshold to identify noisy labeled data instances, and uses counterfactual learning to correct the noisy labels. To the best of our knowledge, NDCC is the first work to explore the use of counterfactual learning in the noisy learning domain. We demonstrate the performance of NDCC on several datasets under a variety of label noise environments. Experimental results show the superiority of the proposed approach compared to the state of the art, especially in the presence of severe label noise.

External IDs:dblp:journals/tai/QiC24