When Optimizing  $f$-Divergence is Robust with Label Noise

Jiaheng Wei; Yang Liu

When Optimizing $f$-Divergence is Robust with Label Noise

Jiaheng Wei, Yang Liu

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: $f-$divergence, robustness, learning with noisy labels

Abstract: We show when maximizing a properly defined $f$-divergence measure with respect to a classifier's predictions and the supervised labels is robust with label noise. Leveraging its variational form, we derive a nice decoupling property for a family of $f$-divergence measures when label noise presents, where the divergence is shown to be a linear combination of the variational difference defined on the clean distribution and a bias term introduced due to the noise. The above derivation helps us analyze the robustness of different $f$-divergence functions. With established robustness, this family of $f$-divergence functions arises as useful metrics for the problem of learning with noisy labels, which do not require the specification of the labels' noise rate. When they are possibly not robust, we propose fixes to make them so. In addition to the analytical results, we present thorough experimental evidence. Our code is available at https://github.com/UCSC-REAL/Robust-f-divergence-measures.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We show when maximizing a properly defined $f$-divergence measure with respect to a classifier's predictions and the supervised labels is robust with label noise.

Supplementary Material: zip

Code: [![github](/images/github_icon.svg) weijiaheng/Robust-f-divergence-measures](https://github.com/weijiaheng/Robust-f-divergence-measures) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=WesiCoRVQ15)

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [CIFAR-100](https://paperswithcode.com/dataset/cifar-100), [CIFAR-100N](https://paperswithcode.com/dataset/cifar-100n), [CIFAR-10N](https://paperswithcode.com/dataset/cifar-10n), [Clothing1M](https://paperswithcode.com/dataset/clothing1m), [Fashion-MNIST](https://paperswithcode.com/dataset/fashion-mnist), [MNIST](https://paperswithcode.com/dataset/mnist)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/when-optimizing-f-divergence-is-robust-with/code)

11 Replies

Loading