# Redflag LLM

## Installation

⚠️  This package requires:
- Flash Attention install.
- Note: this submission codebase may NOT work with models that have untied embeddings (e.g. Llama 8B), as this uses the peft TrainableTokens tuner, in which a custom version was implemented to circumvent a bug which has since been fixed, but is untested. The custom PEFT version is not currently included.

The installation should be all done automatically if you do not need a different version of flash attention. You need to manually download the data. 

> By default, we install [flash_attn-2.8.2+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl](https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.2/flash_attn-2.8.2+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl) which may not be suitable for your system.


Install the base environment:

```bash
# if necessary 
module load python/3.10 cuda/12.4

# init venv
virtualenv venv
source venv/bin/activate

# install with uv, much faster
pip install uv
uv pip install -e .
```
Once you have the data correctly setup, you should be able to run jobs, see below. 

Note: the [requirements](./pyproject.toml#L35) define the version of flash attention defined above. **If your environment does not match this version, please find and download the correct version as described below**.

### Finding the required flash attention version

Install the appropriate version of flash attention from the [releases page](https://github.com/Dao-AILab/flash-attention/releases).
In my case, my PyTorch version was 2.7.1 compiled with CUDA version 12.6, and Python 3.10.11.
You can check these with the following commands in your venv:

```bash
python --version
>> Python 3.10.11

uv pip freeze | grep torch
>> torch==2.7.1

python -c "import torch;print('torch cuda version:',torch.version.cuda);print('cxx11abi=',torch._C._GLIBCXX_USE_CXX11_ABI)"
>> torch cuda version: 12.6
>> cxx11abi= True
```

So I installed the corresponding flash attention package (v2.8.2, the latest release at the time):

```bash
uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.2/flash_attn-2.8.2+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
```


# Run experiments

### With Hydra

Pulls default settings from [configs_v2/config.yaml](./configs_v2/config_llama3-2.yaml).

```bash
python run_hydra.py training_args.output_dir=OUTDIR label=baseline

# submit jobs via slurm
python run_hydra.py -m hydra/launcher=a100l training_args.output_dir=outputs/OUTDIR2 training_args.learning_rate=2.0e-5
```

More complicated runs:

```bash
# when specifying replacement modules, you need to use the '/' syntax e.g. training_args/adv_attack=<ATTACK OBJ>
# when doing scalar overrides, use '.' eg. training_args.alpha_rf_xent=0.1

# locally
python run_hydra.py --config-path=configs_v2 --config-name=config \
	script_args.drop_rf_proba=0.1 \
	script_args/insert_sampler/distribution=normal_long \
	training_args.kl_weighting.scaling_length=60 \
	training_args.alpha_rf_xent=0.1 \
	training_args/adv_attack=scaledl2sgd \
	+label='PAPER-longer_scaling_AT'

# or use --multirun and specify the submitit lanucher for launching a slurm job  
python run_hydra.py --config-path=configs_v2 --config-name=config --multirun hydra/launcher=a100l \
	script_args.drop_rf_proba=0.1 \
	script_args/insert_sampler/distribution=normal_long \
	training_args.kl_weighting.scaling_length=60 \
	training_args.alpha_rf_xent=0.1 \
	training_args/adv_attack=scaledl2sgd \
	+label='PAPER-longer_scaling_AT'
```

You can launch multigpu jobs just by specifying the correct launcher (ensure the launcher has the correct n tasks set! requires 1 per gpu):

```bash
python run_hydra.py --config-path=configs_v2 --config-name=config --multirun hydra/launcher=a100l_2gpu_ddp \
	script_args.drop_rf_proba=0.1 \
	script_args/insert_sampler/distribution=normal_long \
	training_args.kl_weighting.scaling_length=60 \
	training_args.alpha_rf_xent=0.1 \
	training_args/adv_attack=scaledl2sgd \
	+label='PAPER-longer_scaling_AT'
```
