
# Source Code for Constrained Parameter Regularization

This repository contains the source code for the paper "Constrained Parameter Regularization" to replicate the experiments.

Please use a system with at least 1 GPU (for the GPT2 experiments: Ampere, Ada, or Hopper and 8 GPUs)

## Install conda environment

#### If conda is not installed:
```bash
bash install_conda.sh
```

#### Install conda env (takes about 1h):
```bash
bash setup_env.sh
```

#### Install conda env without flash attention (for non Ampere, Ada, or Hopper GPUs and only Grokking and CIFAR100 experiments)
```bash
bash setup_env_without_flash.sh
```

### Activate conda env:
```bash
conda activate cpr
```

## Grokking Experiment

The grokking experiment should run within a few minutes. The results will be saved in the `grokking` folder.
To replicate the results in the paper, run variations with the following arguments:

####  For AdamW:
```bash
python train_grokking_task.py --optimizer adamw --weight_decay 1.0
```

####  For Adam + Rescaling:
```bash
python train_grokking_task.py --optimizer adamw --weight_decay 0.0 --rescale 0.8
```

####  For AdamCPR with L2 norm as regularization function:
```bash
python train_grokking_task.py --optimizer adamcpr --mode l2_constrain --kappa_init_dependent 0.8
```

####  For AdamCPR with standard deviation as regularization function:
```bash
python train_grokking_task.py --optimizer adamcpr --mode std_constrain --kappa_init_dependent 0.9
```

####  For AdamAdaCPR:
```bash
python train_grokking_task.py --optimizer adamcpr --mode l2_constrain --kappa_init_dependent 0.8 --kappa_adapt 1
```



## Object Detection Experiment
The CIFAR-100 experiment should run within 20-30 minutes. The results will be saved in the `cifar100` folder.

####  For AdamW:
```bash
python train_resnet.py --optimizer adamw --lr 0.001 --weight_decay 0.001
```

####  For Adam + Rescaling:
```bash
python train_resnet.py --optimizer adamw --lr 0.001 --weight_decay 0 --rescale_alpha 0.8
```

####  For AdamCPR with L2 norm as regularization function and kappa initialization depending on the parameter initialization:
```bash
python train_resnet.py --optimizer adamcpr --lr 0.001 --mode l2_constrain --kappa_init_dependent 0.8
```

####  For AdamCPR with L2 norm as regularization function and kappa initialization with warm start:
```bash
python train_resnet.py --optimizer adamcpr --lr 0.001 --mode l2_constrain --kappa_init_warm_start 1000
```

####  For AdamAdaCPR with L2 norm as regularization function and kappa initialization with warm start:
```bash
python train_resnet.py --optimizer adamcpr --lr 0.001 --mode l2_constrain --kappa_init_warm_start 1000 --kappa_adapt True
```


## Language Modelling Experiment

The GPT2s experiment should run within 11 hours on 8 GPUs (min. 24GB VRAM). The script downloads the data at the first start and prepares arrow files. This could take a few hours. 
To test the setup we included a test config with wikitext and limited train and valid steps. Please find the results and tensorboard logging in the experiment folder.

To test the setup on wikitext and AdamW:
```bash 
python train_transformer.py -c test_config.yaml train.optimizer_name=adamw
```

To test the setup on wikitext and AdamCPR:
```bash 
python train_transformer.py -c test_config.yaml train.optimizer_name=adamcpr train.adamcpr.kappa_init_warm_start=1000
```


To run the GPT2s experiment on AdamW:
```bash 
python train_transformer.py -c gpt2s_config.yaml train.optimizer_name=adamw
```

To run the GPT2s experiment on AdamCPR:
```bash 
python train_transformer.py -c gpt2s_config.yaml train.optimizer_name=adamcpr train.adamcpr.kappa_init_warm_start=1000
```

To run the GPT2m experiment on AdamW:
```bash 
python train_transformer.py -c gpt2m_config.yaml train.optimizer_name=adamw
```

To run the GPT2m experiment on AdamCPR:
```bash 
python train_transformer.py -c gpt2m_config.yaml train.optimizer_name=adamcpr train.adamcpr.kappa_init_warm_start=1000
```


## Medical Image Segmentation Experiments

We use nnU-Net framework for the segmentation experiments. For more information, refer to the original nnU-Net documentation https://github.com/MIC-DKFZ/nnUNet.
Assuming a running environment with nnU-Net v2 installed and the (preprocessed) datasets are present in the nnU-Net format, the following environment variables have to be set:

```
export nnUNet_preprocessed=/path/to/nnUNet_preprocessed
export nnUNet_results=/path/to/experiment/results
```


With those set, a complete list of experiment commands to be run can be generated via `generate_all_AdamCPR_experiment_commands.sh` and `generate_all_AdamW_baseline_commands.sh`.
One example would be:
```
python train_nnUNet.py 2 3d_fullres 0 -tr nnUNetTrainerQuickAdamCPR -adam_cpr_mode l2_constrain_mh -adam_cpr_kappa 1.0 -adam_cpr_kappa_init_steps 2000
```

Note that these commands assume dataset IDs following nnU-Net's numbering: 
2=Dataset002_Heart, 17=Dataset017_AbdominalOrganSegmentation, 82=Dataset082_BraTS2020
If you used different dataset IDs when generating the dataset.json for nnU-Net's planning & preprocessing, adapt the IDs in the run scripts accordingly.
