# Code Guide

## Start Remote Reward Model and Judge Model
```bash
examples/scripts/my_chosen_remote_rm_url.sh
```

## Start PPO Training use $A^2$ Reward Model
```bash
examples/scripts/train_ppo_adv_chosen_llama_with_remote_rm.sh
```

## Use AdvGenerator
Use the trained AdvGenerator to generate adversarial samples
```bash
adv_data_maker/chosen_maker.py
```

Use the reward model to score the generated adversarial examples
```bash
adv_data_maker/reward_model_filter.py
```

Selecting high-quality adversarial examples
```bash
adv_data_maker/chosen_data/sigmoid_chose.py
```

## Attack Baseline
Constructing adversarial datasets using different baseline methods
```bash
adv_eval_part/attack_success_eval/attack_baseline
```

## Train Reward Model
```bash
examples/scripts/train_rm_ours.sh
```

## Use Eval
Bench evaluation using reward models
```bash
adv_eval_part/reward_model_eval.py
```
Attack success rate eval
```bash
adv_eval_part/attack_success_eval/eval.py
```