

<div align="center">
    
![Python](https://img.shields.io/badge/Python-3.10-3776AB)
![License](https://img.shields.io/badge/License-BSD3-orange)
    
</div>

<div align="center">
    <h3>
      <a href="#installation">Installation</a> |
      <a href="#reproduction-of-experiments">Reproduction of Experiments</a>
    </h3>
</div>

---

# AutoNNU-Net: Towards Automated Medical Image Segmentation

Integration of Automated Machine Learning (AutoML) methods into nnU-Net.

- Free software: BSD license
- Documentation: https://autonnunet.readthedocs.io.

## Repo Structure
The repository is structured in the following directories:
* ```autonnunet```: The AutoNNU-Net python package, including
    * ```analysis```: Plotting, DeepCAVE utilities
    * ```datasets```: MSD Dataset handling
    * ```evaluation```: Predicition tools for the MSD test set
    * ```experiment_planning```: Extensions to the nnU-Net prediction tools for AutoNNU-Net
    * ```hnas```: Hierarchical NAS search space and integration into AutoNNU-Net
    * ```inference```: Prediction within AutoNNU-Net+
    * ```utils```: Collection of various utiltities, e.g., paths
* ```data```: Everything related to (MSD) datasets
* ```output```: Everything that is generated by AutoNNU-Net locally, e.g. optimization results, MSD submissions
* ```results_zipped```: Compressed output, this is stored in the repo
* ```runscripts```: Here are the actual scrips to execute experiments etc.
* ```submodules```: Git submodules, e.g. hypersweeper, nnU-Net etc
* ```tests```: Unit tests for AutoNNU-Net
* ```paper```: Plots and tables generated by plotting scripts

# Installation

***Important regarding AutoML Conference 2025 Submission:*** Due to anonymization of the repository, we had to exclude forks that were linked as submodules. Therefore, we included the submodules as regular files.

***Important:*** This code was only tested for Rocky Linux 9.5 and CUDA 12.4. Other operating systems/GPUs/CUDA versions may not be supported. In order to install AutoNNU-Net, CUDA drivers are highly recommended - otherwise the installation of PyTorch may fail. On HPCs, for example, this means that you have to load the CUDA module before installing the package.

***Important:*** Due to compatibilityi issues with ```numpy```, ```DeepCAVE``` is not listed as a requirement of ```AutoNNU-Net```. However, in order to create the plots and tables, you need to install ```DeepCAVE```. Therefore, we recommend installing ```DeepCAVE``` manually after running the experiments.

1. Clone the repository and its submodulues
```bash
https://anonymous.4open.science/r/AutonnUNet autonnunet
cd autonnunet
```

2. Create and activate an Anaconda/Miniconda environment with Python 3.10
```bash
conda create -n autonnunet python=3.10
conda activate autonnunet
```

3. Install AutoNNU-Net
```bash
make install
```

***Important***: The automated installation is great if you want to install all submodules automatically. However, it is also quite sensible to system-specific python and package versions. Therefore, if the installation using make fails, we recommend to install the subpackages manually:

```bash
# submodules
cd submodules/batchgenerators && pip install . && cd ../../
cd submodules/hypersweeper && pip install . && cd ../../
cd submodules/MedSAM && pip install . && cd ../../
cd submodules/neps && pip install . && cd ../../
cd submodules/nnUNet && pip install . && cd ../../

# AutoNNUNet
pip install -e ".[dev]"
```

# Reproduction of Experiments

## Cluster Setup
For our experiments, we used ```submitit-slurm``` to run code on a SLURM cluster. You can define your custom SLURM cluster configuration in ```runscripts/configs/cluster```.

We ran all experiments using the ```gpu``` cluster configurations.
If you want to run your experiments locally, please use ```cluster=local``` for every command that uses hydra.

## Download Datasets
To download a specific dataset, run
```bash
python autonnunet/datasets/msd_dataset.py --dataset_name=<dataset>
```

For example, to download D01 (BrainTumour), run:
```bash
python autonnunet/datasets/msd_dataset.py --dataset_name=Dataset001_BrainTumour
```

To download all datasets, run
```bash
./runscripts/download_msd.sh
```

## Convert and Pre-process Datasets for nnU-Net
***Important***: This has to be executed on the same cluster/compute environment as the target for the training to get the correct nnU-Net configurations, e.g. by appending ```cluster=gpu```.

```bash
python runscripts/convert_and_preprocess_nnunet.py -m "dataset=glob(*)"
```

## Convert and Pre-process Datasets for MedSAM2

Important: The pre-processing for MedSAM2 must be executed locally, i.e. cannot be submitted on a SLURM cluster due to compatibility issues between pickle and multiprocessing.

```bash
python runscripts/convert_and_preprocess_medsam2.py -m "dataset=glob(*)" "cluster=local"
```

## Baseline Training

### nnU-Net Conv
```bash
python runscripts/train.py -m "dataset=glob(*)" "fold=range(5)"
```

### nnU-Net ResM
```bash
python runscripts/train.py -m "dataset=glob(*)" "fold=range(5)" "hp_config.encoder_type=ResidualEncoderM"
```

### nnU-Net ResL
```bash
python runscripts/train.py -m "dataset=glob(*)" "fold=range(5)" "hp_config.encoder_type=ResidualEncoderL"
```

### MedSAM2

***Important***: First, you need to run the training for at least one of the nnU-Net models for a specific dataset as they create the dataset splits before you can run the MedSAM2 fine-tuning.

1. Download model checkpoint
```bash
cd submodules/MedSAM && mkdir checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
cd ../../../
```

2. Fine-tune MedSAM2 
```bash
python runscripts/finetune_medsam2.py -m "dataset=glob(*)" "fold=range(5)"
```

## Compute Hyperband budgets

```bash
python runscripts/determine_hyperband_budgets.py --b_min=10 --b_max=1000 --eta=3
```

## Auto-nnU-Net

```bash
python runscripts/train.py --config-name=tune_autonnunet -m "dataset=Dataset001_BrainTumour"
```

## HPO + HNAS Ablations

```bash
python runscripts/train.py --config-name=tune_hpo -m "dataset=Dataset001_BrainTumour"
```

```bash
python runscripts/train.py --config-name=tune_hpo_hnas -m "dataset=Dataset001_BrainTumour"
```


## Extract & Train Incumbent

Incumbent configurations are stored in `runscripts/configs/incumbent`. You can find our incumbent configurations already in this directory.
If you want to re-create them after running the experiments, you need to run:
```bash
python runscripts/extract_incumbents.py --approach=hpo
```

Using these configs, you can than run the training of the incumbent configurations using the command:
```bash
python runscripts/train.py -m "dataset=<dataset_name>" "+incumbent=Dataset001_BrainTumour_<approach>" "fold=range(5)" "pipeline.remove_validation_files=False"
```
Please note that you could also use the model saved during the optimization. 
In our experiments, we did not store model checkpoints in the respective run directories to reduce the memory consumption.

To run nnU-Net with the incumbent configuration for the HPO approach on D01, run
```bash
python runscripts/train.py -m "dataset=Dataset001_BrainTumour" "+incumbent=Dataset001_BrainTumour_hpo" "fold=range(5)"
```

## Cross Evaluation
For cross-evaluation of incumbent configurations, we select the 9/10 datasets where HPO+NAS achieved an improvement.
To train all datasets with the incumbent configuration of another dataset, run
```bash
./runscripts/train_cross_eval.sh <dataset_name>
```

## Inference and MSD Submission

```bash
python runscripts/run_inference.py --approach=<approach>
```

Or directly submit it to SLURM:
```bash
sbatch runscripts/run_inference.sh <approach>
```

Creates the MSD submission in `output/msd_submissions`

## Plots and Tables

To generate all plots and tables in the paper and store them in `output/paper`, run
```bash
python runscripts/plot.py
```

## Credits

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage

## Common Issues

### TorchInductor fails when loading JSON, found extra data
Sometimes during optimization, jobs fail while loading cached torch inductor files.
To fix this, run
```bash
rm -rf ~/.cache/torch
rm -rf ~/.cache/triton/
rm -rf ~/.nv/ComputeCache
```