Abstract: In label-noise learning, estimating the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">transition matrix</i> is a hot topic as the matrix plays an important role in building <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">statistically consistent classifiers</i> . Traditionally, the transition from clean labels to noisy labels (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">clean-label transition matrix (CLTM)</i> ) has been widely exploited on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">class-dependent label-noise</i> (wherein all samples in a clean class share the same label transition matrix). However, the CLTM cannot handle the more common <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">instance-dependent label-noise</i> well (wherein the clean-to-noisy label transition matrix needs to be estimated at the instance level by considering the input quality). Motivated by the fact that classifiers mostly output <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes optimal labels</i> for prediction, in this paper, we study to directly model the transition from <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes optimal labels</i> to noisy labels (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes-Label Transition Matrix (BLTM)</i> ) and learn a classifier to predict <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Bayes optimal labels</i> . Note that given only noisy data, it is <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ill-posed</i> to estimate either the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CLTM</i> or the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BLTM</i> . But favorably, Bayes optimal labels have no uncertainty compared with the clean labels, i.e., the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">class posteriors</i> of Bayes optimal labels are <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">one-hot vectors</i> while those of clean labels are not. This enables two advantages to estimate the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BLTM</i> , i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, this work proposes a parametrical model for estimating the instance-dependent label-noise transition matrix by employing a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">deep neural network</i> , leading to better generalization and superior classification performance.
0 Replies
Loading