A Parametrical Model for Instance-Dependent Label Noise

Shuo Yang, Songhua Wu, Erkun Yang, Bo Han, Yang Liu, Min Xu, Gang Niu, Tongliang Liu

Published: 01 Jan 2023, Last Modified: 14 Nov 2023IEEE Trans. Pattern Anal. Mach. Intell. 2023Readers: Everyone

Abstract: In label-noise learning, estimating the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">transition matrix is a hot topic as the matrix plays an important role in building <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">statistically consistent classifiers . Traditionally, the transition from clean labels to noisy labels (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">clean-label transition matrix (CLTM) ) has been widely exploited on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">class-dependent label-noise (wherein all samples in a clean class share the same label transition matrix). However, the CLTM cannot handle the more common <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">instance-dependent label-noise well (wherein the clean-to-noisy label transition matrix needs to be estimated at the instance level by considering the input quality). Motivated by the fact that classifiers mostly output <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes optimal labels for prediction, in this paper, we study to directly model the transition from <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes optimal labels to noisy labels (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes-Label Transition Matrix (BLTM) ) and learn a classifier to predict <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes optimal labels . Note that given only noisy data, it is <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ill-posed to estimate either the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CLTM or the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BLTM . But favorably, Bayes optimal labels have no uncertainty compared with the clean labels, i.e., the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">class posteriors of Bayes optimal labels are <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">one-hot vectors while those of clean labels are not. This enables two advantages to estimate the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BLTM , i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, this work proposes a parametrical model for estimating the instance-dependent label-noise transition matrix by employing a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">deep neural network , leading to better generalization and superior classification performance.

0 Replies