Keywords: Automatic Data Cleansing, Incorrect Labels, Multiple Objects
Abstract: Mislabeled samples cause prediction errors. This study proposes a solution to the problem of incorrect labels, called AutoCleansing, to automatically capture the effect of incorrect labels and mitigate it without removing the mislabeled samples. AutoCleansing consists of a base network model and sample-category specific constants. Both parameters of the base model and sample-category constants are estimated simultaneously using the training data. Thereafter, predictions for test data are made using a base model without the constants capturing the mislabeled effects. A theoretical model for AutoCleansing is developed and showing that the gradient of the loss function of the proposed method can be zero at true parameters with mislabeled data if the model is correctly constructed. Experimental results show that AutoCleansing has better performance in test accuracy than previous studies for CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.
One-sentence Summary: AutoCleansing can capture the effect of incorrect labels and mitigate it without removing the mislabeled samples.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=Ia6A34pNtV
12 Replies
Loading