# RLSF: Reinforcement Learning via Symbolic Feedback

This repository contains the codes for _Translation of Prseudo-Code to C++ code using RLSF_.

Dependencies
----------
The repository is tested on the following module versions:
- StdEnv/2020
- gcc/9.3.0
- cuda/11.4
- python/3.10
- arrow/13

Creating Virtual Environment
----------
First, copy the `trl` folder (included as part of .zip supplementary file) as a sub-directory in `rlsf-codegen` folder. Then do the following:

```
module load StdEnv/2020 gcc/9.3.0 cuda/11.4 python/3.10
module load arrow/13
virtualenv --no-download --clear ~/rlsf
source ~/rlsf/bin/activate

pip install torch==2.0.1
pip install transformers==4.25.1
pip install datasets==2.14.6
pip install accelerate==0.23.0
pip install trl==0.7.4
pip install peft==0.6.2
pip install transformers==4.38.0
pip install sentencepiece
pip install huggingface_hub["cli"]
pip install bitsandbytes
pip install accelerate==0.27.2
pip install wandb
pip install jsonlines

cd trl/
pip install .
cd ..
deactivate
```

Setting up Accelerate Config
----------
```
module load StdEnv/2020 gcc/9.3.0 cuda/11.4 python/3.10
module load arrow/13
source ~/rlsf/bin/activate

accelerate config
```

- Next choose these options in order:
```
➔  This machine
➔  multi-GPU
➔  How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
➔  Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: no
➔  Do you wish to optimize your script with torch dynamo?[yes/NO]: no
➔  Do you want to use DeepSpeed? [yes/NO]: no
➔  Do you want to use FullyShardedDataParallel? [yes/NO]: no
➔  Do you want to use Megatron-LM ? [yes/NO]: no
➔  How many GPU(s) should be used for distributed training? [1]: 4
➔  What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]: all
➔  fp16
```

- It'll ouput: `accelerate configuration saved at /home/<path>/.cache/huggingface/accelerate/default_config.yaml`
- Open that file
- Copy the contents from `./rlsf-codegen/accelerate_config.yaml` to that config file & save
- Your final config file path: `/home/<path>/.cache/huggingface/accelerate/default_config.yaml`

Supervised Fine-Tuning
----------
- Open `./rlsf-codegen/<LLM>_SFT_main_multiGPU_run_script.sh`
- On the 16th line, change the config_file path to your config_file path (last line of Step-2)

**Fine-tuning:**
```
cd rlsf-codegen
sbatch <LLM>_SFT_main_multiGPU_run_script.sh
```

**Inference:**
```
accelerate launch --config_file /home/<path>/.cache/huggingface/accelerate/default_config.yaml \
                  --num_processes 1 \
                  <LLM>_SFT_inference.py
```

RL with Boolean scalar feedback
----------
- Open `./rlsf-codegen/<LLM>_RL_main_multiGPU_run_script.sh`
- On the 16th line, change the config_file path to your config_file path (last line of Step-2)

**Fine-tuning:**
```
cd rlsf-codegen
sbatch <LLM>_RL_main_multiGPU_run_script.sh
```

**Inference:**
- First change the `peft_model_id` path with the appropriate checkpoint in <LLM>_RL_inference.py
```
cd rlsf-codegen
sbatch <LLM>_RL_inference_run_script.sh
```

RLSF with vector feedback
----------
- Open `./rlsf-codegen/<LLM>_RLSF_main_multiGPU_run_script.sh`
- On the 16th line, change the config_file path to your config_file path (last line of Step-2)

**Fine-tuning:**
```
cd rlsf-codegen
sbatch <LLM>_RLSF_main_multiGPU_run_script.sh
```

**Inference:**
- First change the `peft_model_id` path with the appropriate checkpoint in <LLM>_RL_inference.py
```
cd rlsf-codegen
sbatch <LLM>_RL_inference_run_script.sh
```
