# README

## Environment
```
pip install -r requirements.txt
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
```

## Train

SFT training:
```
cd ft_code
bash run_sft.sh
```

DPO training:
```
cd ft_code
bash run_dpo.sh
```

Safe Unlearning training:
```
cd ft_code
bash run_safeunlearn.sh
```

For the implementation of RMU and Circuit Breaker, please refer to their repositories.

## Evaluation

### ASR Evaluation
Evaluation for AutoDAN, GCG and PAIR:
```
cd attack_evaluation
bash run_attack.sh
bash gen_eval_adaptive.sh
```

Evaluation for Raw and Manual:
```
cd attack_evaluation
bash gen_eval.sh
```

Evaluation for Prefilling:
```
cd attack_evaluation
bash prefilling.sh
```
Note that you need to change the model paths in the shell scripts to the real local paths.

### Over-Refusal Evaluation
Evaluation for XSTest:
```
cd attack_evaluation
bash run_xstest.sh
```

### General Performance Evaluation
Evaluation for MTBench:
```
cd quality_evaluation/mtbench
bash eval.sh
```

Evaluation for MMLU:
```
cd quality_evaluation/mmlu
bash lm_eval.sh
```
