# Deep Learning Requires Explicit Regularization for Reliable Predictive Probability.

This repository is the official implementation of ICLR 2021 submission "Deep Learning Requires Explicit Regularization for Reliable Predictive Probability."

This work shows that regularization methods, which constrain the predictive confidence during training, significantly improves the calibration performance and the uncertainty representation ability on out-of-distribution samples.

We consider the following regularization losses, which are added to the cross-entropy loss:
- $L^1$ logit regularization
- $L^2$ logit regularization
- Sliced Wasserstein of order one between the empirical distribution of logits and the standard normal distribution
- Projected error function regularization


Metrics for evaluating the reliability of the predictive probability:
- Negative log-likelihood (NLL)
- Expected calibration error (ECE)
- Distribution plot for out-of-distribution samples

## Requirements
The following command installs all necessary packages:

```setup
pip install -r requirements.txt
```

## Datasets
All datasets used in the expeirments (CIFAR and SVHN) are automatically installed from torchvision library.

## Training
To train ResNet-50 on CIFAR-10 and CIFAR-100 under different regularizers, run these commands:

```resnet_train
# Vanilla
python train.py  --exp_num 0 --arch resnet50 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py  --exp_num 0 --arch resnet50 --dataset cifar100 --save-dir benchmark --gpu 0

# L1 regularization
python train.py --use_l1 --reg_coeff 0.01 --exp_num 0 --arch resnet50 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_l1 --reg_coeff 0.01 --exp_num 0 --arch resnet50 --dataset cifar100 --save-dir benchmark --gpu 0

# L2 regularization
python train.py --use_l2 --reg_coeff 0.003 --exp_num 0 --arch resnet50 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_l2 --reg_coeff 0.01 --exp_num 0 --arch resnet50 --dataset cifar100 --save-dir benchmark --gpu 0

# Sliced Wasserstein regularization
python train.py --use_wass --reg_coeff 0.001 --exp_num 0 --arch resnet50 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_wass --reg_coeff 0.01 --exp_num 0 --arch resnet50 --dataset cifar100 --save-dir benchmark --gpu 0

# Projected error function regularization
python train.py --use_per --reg_coeff 0.03 --exp_num 0 --arch resnet50 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_per --reg_coeff 1.0 --exp_num 0 --arch resnet50 --dataset cifar100 --save-dir benchmark --gpu 0
```

To train VGG-16 on CIFAR-10 and CIFAR-100 under different regularizers, run these commands:
```vgg_train
# Vanilla
python train.py  --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py  --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar100 --save-dir benchmark --gpu 0

# L1 regularization
python train.py --use_l1 --reg_coeff 0.01 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_l1 --reg_coeff 0.003 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar100 --save-dir benchmark --gpu 0

# L2 regularization
python train.py --use_l2 --reg_coeff 0.003 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_l2 --reg_coeff 0.01 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar100 --save-dir benchmark --gpu 0

# Sliced Wasserstein regularization
python train.py --use_wass --reg_coeff 0.001 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_wass --reg_coeff 0.03 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar100 --save-dir benchmark --gpu 0

# Projected error function regularization
python train.py --use_per --reg_coeff 0.003 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar10 --save-dir benchmark --gpu 0
python train.py --use_per --reg_coeff 1.0 --weightdecay 5e-4 --exp_num 0 --arch vgg16 --dataset cifar100 --save-dir benchmark --gpu 0
```

## Evaluation
The following scripts evaluate the pre-trained models, and produces accuracy, NLL, ECE, and the distribution plot of the predictive uncertainty for in-distribution (CIFAR-10 or CIFAR-100) and out-of-distribution samples (SVHN).

```eval
# ResNet-50 on CIFAR-10
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar10_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-res50-vanila-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar10-l1_0.01_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-res50-l1-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar10-l2_0.003_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-res50-l2-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar10-wass_0.001_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-res50-wass-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar10-per_0.03_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-res50-per-ood.png

# ResNet-50 on CIFAR-100
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar100_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-res50-vanila-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar100-l1_0.01_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-res50-l1-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar100-l2_0.01_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-res50-l2-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar100-wass_0.01_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-res50-wass-ood.png
python eval.py --arch resnet50 --resume ./benchmark/resnet50-dropout0.0-exp_num0-wd0.0001-cifar100-per_1.0_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-res50-per-ood.png

# VGG-16 on CIFAR-10
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar10_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-vgg16-vanila-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar10-l1_0.01_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-vgg16-l1-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar10-l2_0.003_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-vgg16-l2-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar10-wass_0.001_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-vgg16-wass-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar10-per_0.003_model.th --gpu 0 --dataset cifar10 --save_fig ./cifar10-vgg16-per-ood.png

# VGG-16 on CIFAR-100
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar100_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-vgg16-vanila-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar100-l1_0.003_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-vgg16-l1-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar100-l2_0.01_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-vgg16-l2-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar100-wass_0.03_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-vgg16-wass-ood.png
python eval.py --arch vgg16 --resume ./benchmark/vgg16-dropout0.0-exp_num0-wd0.0005-cifar100-per_1.0_model.th --gpu 0 --dataset cifar100 --save_fig ./cifar100-vgg16-per-ood.png
```


## Results
Under the regularization methods, ResNet-50 achieves the following performance on CIFAR-100 (for more details, please check the Table 1 in the manuscript)

### Generalization \& Calibration performance
| Model name         | Accuracy  | NLL | ECE  |
| ------------------ |---------------- | -------------- |---------------- |
| Vanilla               |     74.64 $\pm$ 0.04         |      1.31 $\pm$ 0.02       |      13.95 $\pm$ 0.26       |
| $L^1$ regularization  |     76.28 $\pm$ 0.50         |      1.27 $\pm$ 0.02       |      7.77 $\pm$ 0.24       |
| $L^2$ regularization  |     75.84 $\pm$ 0.53         |      1.07 $\pm$ 0.02       |      5.52 $\pm$ 0.31       |
| Sliced Wasserstein    |     76.27 $\pm$ 0.33         |      1.1 $\pm$ 0.01       |      7.02 $\pm$ 0.31       |
| PER                   |     76.23 $\pm$ 0.28         |      1.15 $\pm$ 0.0       |      4.67 $\pm$ 0.3       |


### Out-of-distribution predictive uncertainty
<figure>
<img src="ood_plots.png" alt="drawing" width="900"/>
<figcaption>Red and blue plot represents predictive uncertainty for in-distribution samples (CIFAR-100) and out-of-distribution samples (SVHN), respectively.</figcaption>
</figure>

## Reference
Our code is based on the following public repository:
* CIFAR experiment: https://github.com/facebookresearch/mixup-cifar10

## License
Our code is released under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) license.



