# NCSB
Official implementation of [Anomaly Detection by Estimating Gradients of the Tabular Data Distribution](LINK) from The Thirteenth International Conference on Learning Representations (ICLR 2025). Includes the code for Noise Conditional Score-Based Models Anomaly Detection for tabular data (57 datasets from ADBench with 47 tabular datasets, five datasets composed of extracted representations of
images and five datasets composed of extracted embedding of NLP tasks with overall 122 Subdatasets, and 15 additional datasets from the Literature) and 45 further baselines.

| **Name**    | **Source URL**                                                                                                                                                                                                                               | **Datasets**                                                                                                                                                                                                                                                                                                      |
|-------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ADBench     | https://github.com/Minqi824/ADBench/tree/main/datasets (Automatic loading, no download necessary)                                                                                                                                                                                       | 1_ALOI, 2_annthyroid, 3_backdoor, 4_breastw, 5_campaign, 6_cardio, 7_Cardiotocography, 8_celeba, 9_census, 10_cover, 11_donors, 12_fault, 13_fraud, 14_glass, 15_Hepatitis, 16_http, 17_InternetAds, 18_Ionosphere, 19_landsat, 20_letter, 20news_0, 20news_1, 20news_2, 20news_3, 20news_4, 20news_5, 21_Lymphography, 22_magic,gamma, 23_mammography, 24_mnist, 25_musk, 26_optdigits, 27_PageBlocks, 28_pendigits, 29_Pima, 30_satellite, 31_satimage-2, 32_shuttle, 33_skin, 34_smtp, 35_SpamBase, 36_speech, 37_Stamps, 38_thyroid, 39_vertebral, 40_vowels, 41_Waveform, 42_WBC, 43_WDBC, 44_Wilt, 45_wine, 46_WPBC, 47_yeast, CIFAR10_0, CIFAR10_1, CIFAR10_2, CIFAR10_3, CIFAR10_4, CIFAR10_5, CIFAR10_6, CIFAR10_7, CIFAR10_8, CIFAR10_9, FashionMNIST_0, FashionMNIST_1, FashionMNIST_2, FashionMNIST_3, FashionMNIST_4, FashionMNIST_5, FashionMNIST_6, FashionMNIST_7, FashionMNIST_8, FashionMNIST_9, MNIST-C_brightness, MNIST-C_canny_edges, MNIST-C_dotted_line, MNIST-C_fog, MNIST-C_glass_blur, MNIST-C_identity, MNIST-C_impulse_noise, MNIST, MNIST-C_motion_blur, MNIST-C_rotate, MNIST-C_scale, MNIST-C_shear, MNIST-C_shot_noise, MNIST-C_spatter, MNIST-C_stripe, MNIST-C_translate, MNIST-C_zigzag, MVTec-AD_bottle, MVTec-AD_cable, MVTec-AD_capsule, MVTec-AD_carpet, MVTec-AD_grid, MVTec-AD_hazelnut, MVTec-AD_leather, MVTec-AD_metal_nut, MVTec-AD_pill, MVTec-AD_screw, MVTec-AD_tile, MVTec-AD_toothbrush, MVTec-AD_transistor, MVTec-AD_wood, MVTec-AD_zipper, SVHN_0, SVHN_1, SVHN_2, SVHN_3, SVHN_4, SVHN_5, SVHN_6, SVHN_7, SVHN_8, SVHN_9, agnews_0, agnews_1, agnews_2, agnews_3, amazon, imdb, yelp                                                                                                                                                              |
| ELKI        | https://www.dbs.ifi.lmu.de/research/outlier-evaluation/DAMI/                                                                                                                                                                                 | Parkinson_withoutdupl_norm_75.arff                                                               |
| extended AE | https://www.kaggle.com/datasets/shasun/tool-wear-detection-in-cnc-mill, https://www.kaggle.com/datasets/inIT-OWL/high-storage-system-data-for-energy-optimization, https://www.kaggle.com/datasets/shrutimehta/nasa-asteroids-classification | HRSS_anomalous_optimized.csv, HRSS_anomalous_standard.csv, nasa.csv, and the entire folder: "CNC-kaggle"                                                                                                                                                                                                                                                                                                                                                                   |
| Goldstein   | http://dx.doi.org/10.7910/DVN/OPQMVF                                                                                                                                                                                                         | pen-global-unsupervised-ad.csv, pen-local-unsupervised-ad.csv                                                                                                                                                                                                                           |
| ODDS        | http://odds.cs.stonybrook.edu/                                                                                                                                                                                                               | arrhythmia.mat, wbc.mat and non ".mat" data: seismic-bumps.arff, yeast.data, yeast.names |
|
| ICL        | https://openreview.net/forum?id=_hszZbt46bT (Supplementary Material)                                                                                                                                                     | Abalone.data, Ecoli.data Mulcross.arff,                 



## Setup Instructions

### 1. Install the required packages

You will need to install [ADBench](https://arxiv.org/abs/2206.09426) and torchvision for this project (use your the Pytorch version from your machine ADBench ensures the dependencies). Python needs to be version 3.8+. ADBench has already most of the dependencies needed for the project.

To install all packages, run the following command:


`
pip install -r requirements.txt
`


After this is done, as of right now there is a dependency issue with ADBench and PyOD, ADBench runs with PyOD 1.0.0. Therefore, models of newer PyOD versions were added manually.


### 2. Add additional datasets

Ensure each of the datasets except the ADBench datasets is put into the correct folder in the `raw_data` folder.

```none
Score-based-Anomaly-Detection
├── baselines
├── diffusion
├── NCSBAD
├── raw_data
│   ├── ELKI_data_raw
│   │   ├── Parkinson_withoutdupl_norm_75.arff
│   ├── extended_AE_data_raw
│   │   ├── HRSS_anomalous_optimized.csv
│   │   ├── HRSS_anomalous_standard.csv
│   │   ├── nasa.csv
│   │   ├── CNC-kaggle
│   │   │   ├── experiment_01.csv - experiment_18.csv
│   │   │   ├── README.txt
│   │   │   ├── test_artifact.jpg
│   │   │   ├── train.csv
│   ├── Goldstein_data_raw
│   │   │   ├── pen-global-unsupervised-ad.csv
│   │   │   ├── pen-local-unsupervised-ad.csv
│   ├── ODDS_data_raw
│   │   ├── matfile_data
│   │   │   ├── arrythmia.mat
│   │   │   ├── wbc.mat
│   │   ├── other_data
│   │   │   ├── abalone.data
│   │   │   ├── ecoli.data
│   │   │   ├── mulcross.arff
│   │   │   ├── seismic.arff
│   │   │   ├── yeast.data
```

The raw data can then be processed using the `read_raw_write_in_format.py` script. This stores the sorted data in fhe folder `formated_data`. After that run `save_datasets_add.py` to create the experiments datasets.


### 3. Add ADBench Datasets

Download ADBench dataset automaticaly, by activating `python` in your console and run the following

```
from adbench.myutils import Utils
utils = Utils() # utility function
utils.download_datasets(repo='github')
```

 Exit the console python by putting `Ctrl+D`. ADBench experiment datasets are produced automaticaly by running `save_datasets.py`.

Then run:
`python read_raw_write_in_format.py `
(To create additional data in format)
`python save_datasets.py`
(To create ADBench datasets)
`python save_datasets_add.py`
(To create additional datasets)
`python run_all_refs_val.py`
(For running full benchmark)

Results are saved in the results folder.

For running the interpretability experiment run: `python python main_inter.py`
Result image is saved in mnist folder.

