# Guided Stream of Search

This code is based on the following repositories with modifications.
- https://github.com/kanishkg/stream-of-search
- https://github.com/Cornell-RL/tril


## Environment settings

```
conda env create --name countdown --file environment.yaml
conda activate countdown
cd stream-of-search
pip install -r requirements.txt
cd ..
cd tril
pip install -e .
pip install flash-attn --no-build-isolation
```

## Dataset

```
cd stream-of-search
sh scripts/task/gen_task.sh 
sh scripts/task/gen_task_final.sh
```

## Unsupervised pre-training

```
cd stream-of-search
sh scripts/gpt2/train_sft.sh 
```

## Supervised fine-tuning

```
cd stream-of-search

# Iteration 1
sh scripts/gpt2/star1/gen_final_rand_s0.sh --start 0
.
.
.
sh scripts/gpt2/star1/gen_final_rand_s0.sh --start 199000
sh scripts/gpt2/star1/train_final_rand_s0.sh

# Iteration 2
sh scripts/gpt2/star2/gen_final_rand_s0.sh --start 0
.
.
.
sh scripts/gpt2/star2/gen_final_rand_s0.sh --start 199000
sh scripts/gpt2/star2/train_final_rand_s0.sh

# Iteration 3
sh scripts/gpt2/star3/gen_final_rand_s0.sh --start 0
.
.
.
sh scripts/gpt2/star3/gen_final_rand_s0.sh --start 199000
sh scripts/gpt2/star3/train_final_rand_s0.sh
```

## RL fine-tuning

```
cd tril
sh examples/countdown/countdown_hppo.sh alg.countdown.args.seed=0
```
