A Realistic Simulation Framework for Learning with Label NoiseDownload PDF

25 May 2021 (modified: 22 Oct 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: noisy labels, simulation, datasets, rater features
TL;DR: We propose a framework for generating realistic synthetic datasets for learning with label noise.
Abstract: We propose a simulation framework for generating realistic instance-dependent noisy labels via a pseudo-labeling paradigm. We show that this framework generates synthetic noisy labels that exhibit important characteristics of the label noise in practical settings via comparison with the CIFAR10-H dataset. Equipped with controllable label noise, we study the negative impact of noisy labels across a few realistic settings to understand when label noise is more problematic. Additionally, with the availability of annotator information from our simulation framework, we propose a new technique, Label Quality Model (LQM), that leverages annotator features to predict and correct against noisy labels. We show that by adding LQM as a label correction step before applying existing noisy label techniques, we can further improve the models' performance.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2107.11413/code)
18 Replies

Loading