AutoCleansing: Unbiased Estimation of Deep Learning with Mislabeled Data

Koichi Kuriyama

AutoCleansing: Unbiased Estimation of Deep Learning with Mislabeled Data

Koichi Kuriyama

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Automatic Data Cleansing, Incorrect Labels, Multiple Objects

Abstract: Mislabeled samples cause prediction errors. This study proposes a solution to the problem of incorrect labels, called AutoCleansing, to automatically capture the effect of incorrect labels and mitigate it without removing the mislabeled samples. AutoCleansing consists of a base network model and sample-category specific constants. Both parameters of the base model and sample-category constants are estimated simultaneously using the training data. Thereafter, predictions for test data are made using a base model without the constants capturing the mislabeled effects. A theoretical model for AutoCleansing is developed and showing that the gradient of the loss function of the proposed method can be zero at true parameters with mislabeled data if the model is correctly constructed. Experimental results show that AutoCleansing has better performance in test accuracy than previous studies for CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.

One-sentence Summary: AutoCleansing can capture the effect of incorrect labels and mitigate it without removing the mislabeled samples.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=Ia6A34pNtV

12 Replies

Loading