# RLSF: Reinforcement Learning via Symbolic Feedback

This repository contains the codes for _Game of 24 using RLSF_.

Dependencies
----------
The repository is tested on the following module versions:
- StdEnv/2020
- gcc/9.3.0
- cuda/11.4
- python/3.10
- arrow/13

Creating Virtual Environment
----------
First, copy the `trl` folder (included as part of .zip supplementary file) as a sub-directory in `rlsf-tot` folder. Then do the following:

```
module load StdEnv/2020 gcc/9.3.0 cuda/11.4 python/3.10
module load arrow/13
virtualenv --no-download --clear ~/rlsf
source ~/rlsf/bin/activate

pip install torch==2.0.1
pip install transformers==4.25.1
pip install datasets==2.14.6
pip install accelerate==0.23.0
pip install peft==0.6.2
pip install transformers==4.38.0
pip install sentencepiece
pip install huggingface_hub["cli"]
pip install bitsandbytes
pip install accelerate==0.27.2
pip install wandb
pip install jsonlines

cd trl/
pip install .
cd ..
deactivate
```

Next, install tree-of-thoughts using the steps in their README.

Baselines
----------
I.O. prompting

```
python -u run.py \
    --task game24 \
    --task_start_index 900 \
    --task_end_index 1000 \
    --naive_run \
    --prompt_sample standard \
    --n_generate_sample 10
```

COT prompting

```
python -u run.py \
    --task game24 \
    --task_start_index 900 \
    --task_end_index 1000 \
    --naive_run \
    --prompt_sample cot \
    --n_generate_sample 10
```

ToT prompting (b=n_select_sample)

```
python -u run.py \
    --task game24 \
    --task_start_index 900 \
    --task_end_index 1000 \
    --method_generate propose \
    --method_evaluate value \
    --method_select greedy \
    --n_evaluate_sample 3 \
    --n_select_sample 5
```

RL with Boolean scalar feedback
----------
`python -u rlsf.py --feedback_mode binary`

RL with vector feedback
----------
`python -u rlsf.py --feedback_mode cert`