An information-theoretic framework for learning models of instance-independent label noise

Xia Huang; Kai Fong Ernest Chong

An information-theoretic framework for learning models of instance-independent label noise

Xia Huang, Kai Fong Ernest Chong

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: label noise, noise transition matrix, entropy, information theory, local intrinsic dimensionality

Abstract: Given a dataset $\mathcal{D}$ with label noise, how do we learn its underlying noise model? If we assume that the label noise is instance-independent, then the noise model can be represented by a noise transition matrix $Q_{\mathcal{D}}$. Recent work has shown that even without further information about any instances with correct labels, or further assumptions on the distribution of the label noise, it is still possible to estimate $Q_{\mathcal{D}}$ while simultaneously learning a classifier from $\mathcal{D}$. However, this presupposes that a good estimate of $Q_{\mathcal{D}}$ requires an accurate classifier. In this paper, we show that high classification accuracy is actually not required for estimating $Q_{\mathcal{D}}$ well. We shall introduce an information-theoretic-based framework for estimating $Q_{\mathcal{D}}$ solely from $\mathcal{D}$ (without additional information or assumptions). At the heart of our framework is a discriminator that predicts whether an input dataset has maximum Shannon entropy, which shall be used on multiple new datasets $\hat{\mathcal{D}}$ synthesized from $\mathcal{D}$ via the insertion of additional label noise. We prove that our estimator for $Q_{\mathcal{D}}$ is statistically consistent, in terms of dataset size, and the number of intermediate datasets $\hat{\mathcal{D}}$ synthesized from $\mathcal{D}$. As a concrete realization of our framework, we shall incorporate local intrinsic dimensionality (LID) into the discriminator, and we show experimentally that with our LID-based discriminator, the estimation error for $Q_{\mathcal{D}}$ can be significantly reduced. We achieved average Kullback--Leibler loss reduction from $0.27$ to $0.17$ for $40\%$ anchor-like samples removal when evaluated on the CIFAR10 with symmetric noise. Although no clean subset of $\mathcal{D}$ is required for our framework to work, we show that our framework can also take advantage of clean data to improve upon existing estimation methods.

One-sentence Summary: We introduce a consistent information-theoretic-based estimator for the noise transition matrix of any dataset with instance-independent label noise, without assuming any matrix structure, and without requiring anchor points or clean data.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=5Srbb9XAqh

15 Replies

Loading