# Robust Long-Tailed Learning under Label Noise

This repository is the official implementation of *Robust Long-Tailed Learning under Label Noise*

Our main contributions are:

> (i) We study the problem of long-tailed learning under label noise, which is less explored and is a significant step towards real-world applications; 
> (ii) We find that the commonly used small-loss trick fails in long-tailed learning. Thus, we establish a novel prototypical noise detection method that overcomes the limitations of small-loss trick; 
> (iii) We propose a robust framework, RoLT. It realizes noise detection that is immune to label distribution, and compensates the problem of data scarcity for tail classes. Our framework can be built on top of semi-supervised learning methods without much extra overhead, leading to an improved approach ROLT+. The proposed methods achieve strong empirical performance on benchmark and real-world datasets.

This repository includes:
- Code for the Robust Long-Tailed Learning (RoLT).
- Code for RoLT+ by incorporating RoLT with semi-supervised learning framework.

## Requirements

* Python 3.6

* [PyTorch](https://pytorch.org/) 1.6.0

* numpy 1.19.2

* scikit-learn 0.24.2

Other dependencies partially listed in `requirements.txt`. To install requirements:

```sh
pip install -r requirements.txt
```

## Datasets
We use the CIFAR-10 and CIFAR-100 datasets. For RoLT, the data will be automatically downloaded and converted.

To run RoLT+, we first copy and extract datasets to proper directory using the following script.

```bash
sh copy_data.sh
```

## Usage
### Baseline
To train and evaluate a baseline model, run the following commands:
```bash
# Vanilla ERM for Long-tailed CIFAR10 with label noise
python main.py --cfg ./config/ImbalanceCifar10/feat_uniform.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5

# Vanilla ERM for Long-tailed CIFAR100 with label noise
python main.py --cfg ./config/ImbalanceCifar100/feat_uniform.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5

# Vanilla ERM-DRW for Long-tailed CIFAR10 with label noise
python main.py --cfg ./config/ImbalanceCifar10/feat_uniformdrw.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5

# Vanilla ERM-DRW for Long-tailed CIFAR100 with label noise
python main.py --cfg ./config/ImbalanceCifar100/feat_uniformdrw.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5
```
### Robust Long-Tailed Learning under Label Noise (RoLT)

We provide scripts for the long-tailed learning methods applied with prototypical noise detection and soft pseudo-labeling as reported in our study. To train and evaluate a baseline model, run the following commands (We simply set `imbalance_ratio=0.01`, `noise_level=0.5` as examples):

```bash
# RoLT for Long-tailed CIFAR10 with label noise
python main.py --cfg ./config/ImbalanceCifar10/feat_uniform.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 --cleaning

# RoLT-DRW for Long-tailed CIFAR10 with label noise
python main.py --cfg ./config/ImbalanceCifar10/feat_uniformdrw.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 --cleaning

# RoLT for Long-tailed CIFAR100 with label noise
python main.py --cfg ./config/ImbalanceCifar100/feat_uniform.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 --cleaning

# RoLT-DRW for Long-tailed CIFAR100 with label noise
python main.py --cfg ./config/ImbalanceCifar100/feat_uniformdrw.yaml --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 --cleaning
```

#### Combining RoLT with Semi-Supervised Learning (RoLT+)

```bash
# DivideMix for Long-tailed CIFAR10 with label noise
python Train_cifar.py --dataset cifar10 --arch resnet18 --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 -b loss

# RoLT+ for Long-tailed CIFAR10 with label noise
python Train_cifar.py --dataset cifar10 --arch resnet18 --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 --cls_ind -b dist

# DivideMix for Long-tailed CIFAR100 with label noise
python Train_cifar.py --dataset cifar100 --arch resnet18 --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 -b loss

# RoLT+ for Long-tailed CIFAR100 with label noise
python Train_cifar.py --dataset cifar100 --arch resnet18 --imb_type exp --imb_factor 0.01 --noise_mode imb --noise_ratio 0.5 --cls_ind -b dist
```

## Acknowledgement
We thank the authors for the following repositories for code reference:
[LDAM-DRW](https://github.com/kaidic/LDAM-DRW), 
[OLTR](https://github.com/zhmiao/OpenLongTailRecognition-OLTR), 
[Classifier-Balancing](https://github.com/facebookresearch/classifier-balancing), 
[DivideMix](https://github.com/LiJunnan1992/DivideMix), etc.