# TCCM
This is the official repository of the NeurIPS 2025 submission entitled: Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching

## Instructions on reproducing the main results
### Setup Instructions
The codebase is established on python 3.9.21 and works on Linux OS.

1. Download or cloning this repository to your local PC or cluster (preferable)

2. Install the required python and packages
```bash
pip install -r requirements.txt
```

### Datasets
The ADBench datasets used in our experiments are compressed in ```datasets.zip```.

**Please unzip the file in the current directory so that the extracted contents form the ```./datasets``` directory.**

All datasets are split into four categories: small, medium, high-dimensional (high_dim), and large (Table 1)

### Main Results
#### To reproduce the benchmark results showed in Figure 2 (AUROC and AUPRC)
RUN FROM THE ROOT DIRECTORY OF THIS REPO:

```bash
chmod u+x ./bash_files/run_everything.sh
./bash_files/run_everything.sh
```

The log of each run is stored under the directory "./logs" as:

"./logs/seed_```$RANDOM_SEED```/run_```{dataset_ID_NAME}```_model_```{model_ID}```.log"

Once all the runs are done, you will see the message:

```LARGE datasets processed.``` at the end of the log file: ./logs/All_log.log

#### Retrieve results and visualization
```bash
python AggregateResults.py
python Visualization.py
```
The AUROC and AUPRC plots will be saved as ```Rank_ROC.pdf``` and ```Rank_PR.pdf```.

The dataset-wise tabular results as well as the rankings of models are saved under ```./final_metrics/all```.

NOTES:
- All experiment runs are executed on CPU only.
- For each dataset, we allow each algorithm to use a maximum of 10 GB of RAM and a maximum runtime of 3 days.
    - You can change the ```MEMORY_LIMIT``` and ```TIME_LIMIT``` in ```run_everything.sh``` accordingly.
- By default, we run 45 × (```K``` = 3) = 135 jobs in parallel. Please adjust the value of ```K``` as needed.
- By default, the experiments are executed on all datasets for all models, which may result in a significantly long execution time.
- The performance may differ slightly (regarding numerical precision) from the results reported in the paper due to randomness introduced by CPU specifications. However, **we have verified the results on at least two very different clusters, and the overall ranking of models remains consistent**.

### Ablation Studies
To reproduce the results of ablation studies as shown in Appendix D.3

Please run the following command from the root directory of this project:
```bash
python AblationStudies.py
```

The results (plots) will be saved under the folder ```./results_ablation```:
- Time_Embedding_Figure_7.pdf: Study 1 Time Embedding Variants Used in TCCM.
- Sensitivity_t_Figure_8.pdf: Study 2 Sensitivity to Fixed Time t during Inference.
- Noise_Injection_Figure_9.pdf: Study 3 Effect of Noise Injection during Training.
- Contamination_Figure_10.pdf: Study 4 Effect of Contamination in Training Data.

### Empirical Robustness Verification
To reproduce the results of empirical robustness verification for TCCM as shown in Appendix D.4.

Please run the following command from the root directory of this project:
```bash
chmod u+x ./bash_files/run_robustness.sh
./bash_files/run_robustness.sh
```

The results (both raw data and plots) are located in the ```./results_robustness``` directory:
- combined_FP_False.pdf: False Negative Attack – aims to make anomaly samples appear as normal.
- combined_FP_True.pdf: False Positive Attack – aims to make normal samples appear as anomalies.