# AP-OOD: Attention Pooling for Out-of-Distribution Detection

This is the implementation of "AP-OOD: Attention Pooling for Out-of-Distribution Detection". You can find a generic implementation of AP-OOD in the class  `HopfieldSoftmaxSplitOODDetector`. It is located in the `ood_core` package in `__init__.py`.

## Installation

- AP-OOD works best with Anaconda ([download here](https://www.anaconda.com/download)). 
  To install AP-OOD and all dependencies, run the following commands:

  ```
  conda env create -f environment.yml
  conda activate ap-ood
  pip install -e .
  ```

## Weights and Biases

- AP-OOD supports logging with Weights and Biases (W&B). By default, W&B will log all metrics in [anonymous mode](https://docs.wandb.ai/guides/app/features/anon). Note that runs logged in anonymous mode will be deleted after 7 days. To keep the logs, you need to [create a W&B account](https://docs.wandb.ai/quickstart). When done, login to your account using the command line.

## Data Sets
To run, you need the following data sets. We follow the benchmark from [Ren et al. (2022)](https://arxiv.org/abs/2209.15558).

The location of the data sets and other environment variables is managed via a `.env` file: Copy the `.env.examples` file located in the root directory of the repository. Name the newly created file `.env`. Customize the new file to contain the paths to the data sets on your machine.

### In-Distribution Data Sets

  * [XSUM](https://huggingface.co/datasets/EdinburghNLP/xsum): Automatically downloaded from HuggingFace
  * [WMT15 En--Fr](https://huggingface.co/datasets/wmt/wmt15): Automatically downloaded from HuggingFace

### Auxiliary Outlier Data Set

  * [C4](https://huggingface.co/datasets/allenai/c4): Automatically downloaded from HuggingFace
  * [ParaCrawlv9](https://opus.nlpl.eu/ParaCrawl/en&fr/v9/ParaCrawl): Download it from the link (format bilingual-moses), extract it, and link the environment variable `PARACRAWL_ROOT` to the location of the extracted file.


### Out-of-Distribution Test Data Sets

The OOD test data for the summarization task consists of:

* [CNN/Daily Mail](https://huggingface.co/datasets/abisee/cnn_dailymail): Automatically downloaded from HuggingFace
* [Lil-Lab Newsroom](https://huggingface.co/datasets/lil-lab/newsroom): Automatically downloaded from HuggingFace
* [Reddit-TIFU](https://huggingface.co/datasets/ctr4si/reddit_tifu): Automatically downloaded from HuggingFace
* [Samsum](https://huggingface.co/datasets/Samsung/samsum): Automatically downloaded from HuggingFace

The OOD test data for the translation task consists of. For the Opus data sets, create a new directory for the data set and set the environment variable `OPUS_ROOT` to the location of the directory.

* [Newstest14](https://www.statmt.org/wmt15/translation-task.html) Download the development sets from the link and set the environment variable `WMT_DEV_ROOT` to the location of the extracted files.
* [Newsdiscussdev2015](https://www.statmt.org/wmt15/translation-task.html) Download the development sets from the link and set the environment variable `WMT_DEV_ROOT` to the location of the extracted files.
* [Newsdiscusstest2015](https://www.statmt.org/wmt15/translation-task.html) Download the test sets from the link and set the environment variable `WMT_TEST_ROOT` to the location of the extracted files.
* [Opus-Law](https://opus.nlpl.eu/ELRC-EUIPO_law/en&fr/v1/ELRC-EUIPO_law) Download the data set (format bilingual-moses) from the link and place it in `OPUS_ROOT` in the subdirectory `law`.
* [Opus-Medical](https://opus.nlpl.eu/EMEA/en&fr/v3/EMEA) Download the data set (format bilingual-moses) from the link and place it in `OPUS_ROOT` in the subdirectory `medical`.
* [Opus-Koran](https://opus.nlpl.eu/Tanzil/en&fr/v1/Tanzil) Download the data set (format bilingual-moses) from the link and place it in `OPUS_ROOT` in the subdirectory `Koran`.
* [Opus-IT](https://opus.nlpl.eu/Ubuntu/en&fr/v14.10/Ubuntu) Download the data set (format bilingual-moses) from the link and place it in `OPUS_ROOT` in the subdirectory `it`.
* [Opus-Subtitles](https://opus.nlpl.eu/OpenSubtitles/en&fr/v2018/OpenSubtitles) Download the data set (format bilingual-moses) from the link and place it in `OPUS_ROOT` in the subdirectory `subtitles`.

## How to Run

### Summarization

1. Set the environment variable `EMBEDDING_ROOT` to the location where you want to store the language model embeddings.
2. To create the input and output embeddings for text summarization, run the command
   ```
   python text_ood/create_embeddings.py -cn summarization-pegasus-xsum --multirun embedding_type=INPUT,OUTPUT
   ```
3. To run the unsupervised method on the input and output, run
   ```
   python text_ood/run_methods.py -cn summarization-pegasus-xsum-input method=ours
   python text_ood/run_methods.py -cn summarization-pegasus-xsum-output method=ours
   ```
4. To run the supervised method on the input and output, run
   ```
   python text_ood/run_methods.py -cn summarization-pegasus-xsum-input method=hopfield_classifier_fully
   python text_ood/run_methods.py -cn summarization-pegasus-xsum-output method=hopfield_classifier_fully
   ```


### Translation

1. Set the environment variable `WMT_MODEL_CHECKPOINT` to the location where you want to store the model checkpoints.
2. Set the environment variable `EMBEDDING_ROOT` to the location where you want to store the language model embeddings.
3. Train the model using
   ```
   python text_ood/transformer/train_wmt.py
   ```
3. To create the input and output embeddings for translation, run the command
   ```
   python text_ood/create_embeddings.py -cn translation-transformer-wmt --multirun embedding_type=INPUT,OUTPUT
   ```
4. To run the unsupervised method on the input and output, run
   ```
   python text_ood/run_methods.py -cn translation-transformer-wmt-input method=ours
   python text_ood/run_methods.py -cn translation-transformer-wmt-output method=ours
   ```
5. To run the supervised method on the input and output, run
   ```
   python text_ood/run_methods.py -cn translation-transformer-wmt-input method=hopfield_classifier_fully
   python text_ood/run_methods.py -cn translation-transformer-wmt-output method=hopfield_classifier_fully
   ```
