A Realistic Simulation Framework for Learning with Label Noise

Keren Gu; Xander Masotto; Vandana Bachani; Balaji Lakshminarayanan; Jack Nikodem; Dong Yin

A Realistic Simulation Framework for Learning with Label Noise

Keren Gu, Xander Masotto, Vandana Bachani, Balaji Lakshminarayanan, Jack Nikodem, Dong Yin

25 May 2021 (modified: 25 Nov 2024)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: noisy labels, simulation, datasets, rater features

TL;DR: We propose a framework for generating realistic synthetic datasets for learning with label noise.

Abstract: We propose a simulation framework for generating realistic instance-dependent noisy labels via a pseudo-labeling paradigm. We show that this framework generates synthetic noisy labels that exhibit important characteristics of the label noise in practical settings via comparison with the CIFAR10-H dataset. Equipped with controllable label noise, we study the negative impact of noisy labels across a few realistic settings to understand when label noise is more problematic. Additionally, with the availability of annotator information from our simulation framework, we propose a new technique, Label Quality Model (LQM), that leverages annotator features to predict and correct against noisy labels. We show that by adding LQM as a label correction step before applying existing noisy label techniques, we can further improve the models' performance.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-realistic-simulation-framework-for-learning/code)

18 Replies

Loading