Open-sampling: Re-balancing Long-tailed Datasets with Out-of-Distribution DataDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: long-tailed recognition, Out-of-Distribution, open-set noisy labels, deep learning
Abstract: Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. To handle this issue, popular re-sampling methods generally require in-distribution data to balance the class priors. However, obtaining suitable in-distribution data with precise labels for selected classes is challenging. In this paper, we theoretically show that out-of-distribution data (i.e., open-set samples) could be leveraged to augment the minority classes from a Bayesian perspective. Based on this motivation, we propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset. For each open-set instance, the label is sampled from our pre-defined distribution that is complementary to the original class priors. Furthermore, class-dependent weights are generated to provide stronger regularization on the minority classes than on the majority classes. We empirically show that Open-sampling not only re-balances the class prior but also encourages the neural network to learn separable representations. Extensive experiments on benchmark datasets demonstrate that our proposed method significantly outperforms existing data re-balancing methods and can be easily incorporated into existing state-of-the-art methods to enhance their performance.
One-sentence Summary: We propose a simple yet effective method to leverage out-of-distribution data for class-imbalanced learning.
Supplementary Material: zip
5 Replies
