Resolving Training Biases via Influence-based Data Relabeling

Shuming Kong; Yanyan Shen; Linpeng Huang

Resolving Training Biases via Influence-based Data Relabeling

Shuming Kong, Yanyan Shen, Linpeng Huang

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 OralReaders: Everyone

Keywords: Training bias, influence functions, data relabeling

Abstract: The performance of supervised learning methods easily suffers from the training bias issue caused by train-test distribution mismatch or label noise. Influence function is a technique that estimates the impacts of a training sample on the model’s predictions. Recent studies on \emph{data resampling} have employed influence functions to identify \emph{harmful} training samples that will degrade model's test performance. They have shown that discarding or downweighting the identified harmful training samples is an effective way to resolve training biases. In this work, we move one step forward and propose an influence-based relabeling framework named RDIA for reusing harmful training samples toward better model performance. To achieve this, we use influence functions to estimate how relabeling a training sample would affect model's test performance and further develop a novel relabeling function R. We theoretically prove that applying R to relabel harmful training samples allows the model to achieve lower test loss than simply discarding them for any classification tasks using cross-entropy loss. Extensive experiments on ten real-world datasets demonstrate RDIA outperforms the state-of-the-art data resampling methods and improves model's robustness against label noise.

One-sentence Summary: We propose an influence-based relabeling framework for solving training bias with a theoretical guarantee

Supplementary Material: zip

24 Replies

Loading