# Hierarchical OOD Detection
Hierarchical OOD detection offers an interpretable OOD technique capable of
intermediate inference which enhances the utility of the predictions for the
end user.

This repository contains the code necessary to reproduce our results.

# Contents
1. [Setup](#setup)
2. [Inspecting hierarchies](#inspecting-hierarchies)
3. [Visualizing hierarchies in D3](#visualizing-hierarchies-in-d3)
4. [Training](#training)
5. [Evaluation](#evaluation)
6. [Inference](#inference)
7. [Other](#other)

# Setup
Various python packages are [required](./requirements.txt) by this package.

We utilize Google protocol buffers for handling the model parameters and
hyperparameters of each experiment. The protocol buffer binary is included in
the `local/` directory.

We baseline against [NBDT: Neural Backed Decision Trees](https://github.com/alvinwan/neural-backed-decision-trees).
In order to generate your own NBDT based trees clone the repository into `nbdt/`.
We provide a [utility script](./utils/convert_nbdt_hierarchy.py) to convert
a NBDT graph into our hierarchy format. Also, you can generate a NBDT hierarchy
using the script [gen-radial-dendrogram](./bin/gen-radial-dendrogram) or
following the instructions from NBDT.

In order to train and evaluate models you will need to download [Imagenet-1k](https://www.image-net.org/).
We provide a script to help setup our dataset splits
[here](./data/gen-symlinks.sh) which creates symlinks to the original
imagenet-1k data location. As an example, to create our "Imagenet-100" and
"Imagenet-1K" splits as described in the paper you would run the following commands:
```sh
cd data
# Imagenet 100
mkdir -p coarse/{train,val,ood}
./gen-symlinks coarse-id-labels.csv <IMAGENET1K_TRAIN_DIR> coarse/train
./gen-symlinks coarse-id-labels.csv <IMAGENET1K_VAL_DIR> coarse/val
./gen-symlinks coarse-ood-labels.csv <IMAGENET1K_VAL_DIR> coarse/ood

# Imagenet 1K
mkdir -p imagenet1000/{train,val,ood}
./gen-symlinks imagenet1000-prune25p_id_labels.csv <IMAGENET1K_TRAIN_DIR> imagenet1000/train
./gen-symlinks imagenet1000-prune25p_id_labels.csv <IMAGENET1K_VAL_DIR> imagenet1000/val
./gen-symlinks imagenet1000-prune25p_ood_labels.csv <IMAGENET1K_VAL_DIR> imagenet1000/ood

# Imagenet 1K Fine, Medium, Coarse ood
mkdir -p imagenet1000-{fine,medium,coarse}ood
./gen-symlinks imagenet1000-prune25p_ood_labels.csv <IMAGENET1K_VAL_DIR> imagenet1000/ood fine
./gen-symlinks imagenet1000-prune25p_ood_labels.csv <IMAGENET1K_VAL_DIR> imagenet1000/ood medium
./gen-symlinks imagenet1000-prune25p_ood_labels.csv <IMAGENET1K_VAL_DIR> imagenet1000/ood coarse
```

NOTE: Our Imagenet-100 and Imagenet-1K splits are provided in the following
csv files:
- [Imagenet 100 ID](./data/coarse-id-labels.csv)
- [Imagenet 100 OOD](./data/coarse-ood-labels.csv)
- [Imagenet 1K ID](./data/imagenet1000-prune25p_id_labels.csv)
- [Imagenet 1K OOD](./data/imagenet1000-prune25p_ood_labels.csv)

# Inspecting Hierarchies
We provide all of the hierarchies used in the following ".pth" files:
- [Imagenet 100 Pruned WN](./pruned-wn.pth)
- [Imagenet 100 2 Lvl WN](./two-lvl-wn.pth)
- [Imagenet 100 NBDT](./nbdt-induced-coarse-softmaxR0.pth)
- [Imagenet 100 & Imagenet 1K MOS](./imagenet1000-mos.pth)
- [Imagenet 1K Pruned WN](./imagenet1000-wn.pth)
- [Imagenet 1K 100 Synset WN](./imagenet1000-wn-100synset.pth)
- [Imagenet 1K 50 Synset WN](./imagenet1000-wn-50synset.pth)

NOTE: We also use the Imagenet 100 2 Lvl WN for the MOS experiments.

You can inspect the hierarchies directly by loading them with `torch.load`
(they are dictionaries), or you can load them as our `hierarchy_util.Hierarchy`
class object by:
```sh
# python
jupyter console
```
then in interactive mode execute:
```python
import torch
import pandas as pd
import hierarchy_util as hutil

id_labels = pd.read_csv('./data/coarse-id-labels.csv', header=None)
ood_labels = pd.read_csv('./data/coarse-ood-labels.csv', header=None)
all_labels = id_labels[0].tolist()
all_labels.extend(ood_labels[0].tolist())

H = hutil.Hierarchy(all_labels, './pruned-wn.pth')
```
 
# Visualizing hierarchies in D3

We also provide a script to generate [D3](https://d3js.org/) webpages to visualize hierarchies
[here](./d3code/gen_hierarchy.py). Please run `cd d3code; python gen_hierarchy.py -h`
for usage.

# Training
All of our experimental configurations are contained within Google protocol
buffer files. We provide the Imagenet-100 experiment configs and Imagenet-1K
config directory structure as examples. The protocol buffer fields can be found
in [proto file](./protos/main.proto) and [ensemble proto](./protos/ensemble.proto).
Experiment configs can be found under the "experiments" directory. For example,
the HSC Imagenet-100 $\alpha=1.0$, $\beta=0.0$ experiment can be found
 [here](./experiments/coarse/cascade/pruned-wn/softpred_R0/exp.config).
To run training for this model you would execute the following command:
```sh
CUDA_VISIBLE_DEVICES=<DEVICE> python main.py --config_fn experiments/coarse/cascade/pruned-wn/softpred_R0/exp.config
```
Note: For ensembles, we symlink to the individual models:
```sh
cd experiments/coarse/cascade/pruned-wn/
ln -s `pwd`/softpred_R0 ensemble_M3/R0
ln -s `pwd`/softpred_R1 ensemble_M3/R1
ln -s `pwd`/softpred_R2 ensemble_M3/R2
```

# Evaluation
The [gather metrics](./gather_metrics.py) python script generates and logs the
metrics for a given model.
To gather metrics for the above model you would execute the following command:
```sh
CUDA_VISIBLE_DEVICES=<DEVICE> python gather_metrics.py --config_fn experiments/coarse/cascade/pruned-wn/softpred_R0/exp.config
```

You can gather ensemble metrics by:
```sh
CUDA_VISIBLE_DEVICES=<DEVICE> python gather_ensemble_metrics.py --config_fn experiments/coarse/cascade/pruned-wn/ensemble_M3/exp.config
```

# Inference
Please see the [Jupyter Notebook](./OODGamify.ipynb) for calculating
hierarchical inference metrics.

# Other
Note that there are many additional capabilities provided in this code base
that are not described here. Please read through the code and additional
scripts to understand their usage.
