An adaptive and general model for label noise detection using relative probabilistic density

Published: 2022, Last Modified: 26 Aug 2024Knowl. Based Syst. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present a model, called relative probability density (RPD), to detect label noise by utilizing the contrasting characteristics in different classes. RPD has a natural ratio structure so that a powerful measurement, the Kullback–Leibler Importance Estimation Procedure (KLIEP), can be directly applied for its calculation instead of calculating the probability density in the numerator and denominator separately. In addition, the RPD model can be reduced to a new form that contains only P(Y|X)<math><mrow is="true"><mi is="true">P</mi><mrow is="true"><mo is="true">(</mo><mi is="true">Y</mi><mo is="true">|</mo><mi is="true">X</mi><mo is="true">)</mo></mrow></mrow></math> and can be calculated with only a probabilistic classifier and without relying on any other specific measurements, specific loss functions, noise estimation or other extra parameters. Furthermore, an RPD-based filter learning framework, which can adaptively optimize the threshold to accurately identify label noise, is proposed. The experimental results on synthetic and real data sets demonstrate that the RPD-based filter learning framework is more effective than some representative methods. The superior generality and adaptiveness, in addition to the simple design, make it a good replacement for traditional probabilistic classifiers on label-noisy data.
Loading