# LEAP: Learning from Experts with Access to Privilege

## Setup

### Setup AlfWorld
Clone alfworld. Follow instructions in README to get the gamefiles. 

Create a env_assets folder `mkdir -p env_assets/alfworld` and copy over data to `env_assets/alfworld`. Add 
```bash
export ALFWORLD_DATA=</path/to/env_assets/alfworld>
```

### Setup Webshop

First, clone the repo
```bash
git clone https://github.com/princeton-nlp/WebShop
```

Then create a conda environment

``bash
conda create -n decision_oaif python=3.10
```

Then activate it
``bash
conda activate decision_oaif
```

then install pyserini using conda https://github.com/castorini/pyserini/blob/master/docs/installation.md

If you are in mac
```bash
conda install wget -y
conda install -c conda-forge openjdk=21 maven -y
conda install -c conda-forge lightgbm nmslib -y
conda install -c pytorch faiss-cpu pytorch -y

pip install pyserini
```
else if you are in linux
```bash
conda install -c conda-forge openjdk=21
pip install torch faiss-cpu
pip install pyserini
```

Then install requirements.txt from webshop
```bash
pip install -r requirements.txt
```

Then install some more stuff from webshop
```bash
conda install -c pytorch faiss-cpu;
python -m spacy download en_core_web_lg
```

Then run the following in webshop to download data and run search engine

```bash
mkdir -p data;
cd data;
gdown https://drive.google.com/uc?id=1EgHdxQ_YxqIQlvvq5iKlCrkEKR6-j0Ib; # items_shuffle_1000 - product scraped info
gdown https://drive.google.com/uc?id=1IduG0xl544V_A_jv3tHXC0kyFi7PnyBu; # items_ins_v2_1000 - product attributes
gdown https://drive.google.com/uc?id=1A2whVgOO0euk5O13n2iYDM0bQRkkRduB; # items_shuffle
gdown https://drive.google.com/uc?id=1s2j6NgHljiZzQNL3veZaAiyW_qDEgBNi; # items_ins_v2
gdown https://drive.google.com/uc?id=14Kb5SPBk_jfdLZ_CDBNitW98QLDlKR5O # items_human_ins
cd ..

cd search_engine
mkdir -p resources resources_100 resources_1k resources_100k
python convert_product_file_format.py # convert items.json => required doc format
mkdir -p indexes
./run_indexing.sh
cd ..
```

Then install requirements.txt from decision_oaif
```bash
pip install -r requirements.txt
```

Then install both webshop and decision_oaif
```bash
pip install -e .
```

#### Only for live test on the web browser
Run the webshop server.
```bash
bash bash/run_webshop_server.sh
```

If you installed everything correctly as above, you should see a website in [http://localhost:3000/ABC](http://localhost:3000/ABC)

### Download training data
Download the data to a `data/` drive
```bash
gdown "https://drive.google.com/uc?id=1YxegO5hR3bJvHdmQBryfoUldYVgZPCzc"
unzip data.zip
rm data.zip
```

### Setting up OpenAI Env
Ensure you have a `.env` file with your OpenAI API key and organization ID:

```
OPENAI_API_KEY=your_openai_api_key
OPENAI_ORGANIZATION=your_openai_organization_id
```
### Setting Wandb

Add the following to `~/.bashrc`
```bash
export WANDB_API_KEY=<api key>
export WANDB_PROJECT=<project>
export WANDB_ENTITY=<entity>
```

## Evaluation

### Evaluation of Alfworld
Configure the agents you would like to evaluate in `configs/eval_alfworld.yaml` and run the following script

```bash
python scripts/eval/eval_alfworld.py --eval_config configs/eval_alfworld.yaml
```
It will create a folder in `data/eval/alfworld/` with the current datetime where the logs and summary.csv will be saved

### Evaluation of Webshop
Make sure in another tab, you are running the webshop server
```bash
bash bash/run_webshop_server.sh
```

Configure the agents you would like to evaluate in `configs/eval_webshop.yaml` and run the following script
```bash
python scripts/eval/eval_webshop.py --eval_config configs/eval_webshop.yaml
```
It will create a folder in `data/eval/webshop/` with the current datetime where the logs and summary.csv will be saved

## Preliminaries: Collect Data

### Generate raw logs

#### Generating alfworld logs

The script below will run human collected logs for every alfworld game. This could take a while because loading an alfworld game is slow.
```bash
python scripts/dataproc/collect_logs_alfworld.py --config configs/training_alfworld.yaml
```

#### Generating webshop logs

The script below will read pre-collected logs from webshop.
```bash
python scripts/dataproc/collect_logs_webshop.py --config configs/training_webshop.yaml
```

### Generate Reasoning Logs

To generate reasoning logs, use the following command:

```bash
python scripts/dataproc/annotate_reason.py --config configs/training_{environment}.yaml
```

### Extract Privileged State

#### Extract privileged state from logs (e.g. alfworld)
Extract privileged state from logs
```bash
python scripts/dataproc/extract_privileged_state_from_logs.py --config configs/training_alfworld.yaml
```

#### Extract privileged state for webshop
Extract privileged state for webshop
```bash
python scripts/dataproc/extract_privileged_state_webshop.py --config configs/training_webshop.yaml
```

## Iterative Training Loop

Let's go through all the steps of a generic iteration of training

### Roll out previous iteration model (skip this for iter0 training)
We will first take our previous model, roll it out in the environment to collect trajectories in `data/{environment}/corrections/{iter_id-1}/rollout` 

```bash
python scripts/eval/eval_{environment}.py --training_config configs/training_{environment}.yaml --iter {iter_id-1}
```

### Generate corrections on rollout trajectory (skip this for iter0 training)
We will then invoke correction oracle on the rollouts to generate correction in `data/{environment}/corrections/{iter_id-1}/correction` 

```bash
python scripts/dataproc/correct_student_trajectory.py --config configs/training_{environment}.yaml --iter {iter_id-1}
```

### Generate Training Data

To generate training data for the first iteration, use the following command:

```bash
python scripts/dataproc/create_training_data.py --config configs/training_{environment}.yaml --train_method {train_method} --iter {iter_id}
```

### Train SFT model

Run SFT script corresponding to the correct environment and iteration 0

```bash
bash bash/2408/train-0804-sft-alfworld-iterx.sh {iter_id}
```

### Upload model to huggingface

Edit `upload_models_to_hf.py` to pass in the local model name and upload as `Meta-Llama-3-8B-Instruct-sft-{environment}-{iter_id}`