# Generalizable Language-Conditioned Policy Learning with LLMs and Minimal Data Requirements

This repository contains the code associated with our submission to ICLR 2025 where we introduce a new framework to train language conditioned policies with Large Language Models.

### Experiments
**Setup**

Navigate to the root of the TEDUO folder
```
cd TEDUO
```
Using conda, create and activate a new environment. 
```
conda create -n <environment name> python=3.10.13
conda activate <environment name>
```
Then install the repository requirements.
```
pip install -r requirements.txt
```
Naviguate to the modified minigrid (BabyAI) fork and install it.
```
cd ../minigrid/
pip install -e .
cd ../TEDUO/
```

**Data**

You can collect the dataset of state-action transitions $\mathcal{D}$ with the following command :
```
python source/collection/collect_data_minigrid.py --name <dataset name> --env BabyAI-BossCustomLevel-v0 
```

The training goals $G^{tr}$ used for the benchmark is automatically generated by source/abstraction/generate_goal.py during the TEDUO pipeline.


**Running TEDUO pipeline**

You can directly run the full TEDUO pipeline on collected data by running successively the following commands : 

**STEP 1 :**
- Building the goal-conditionned abstract states
```
python source/abstraction/abstraction_function.py --name <collected dataset name> --output_name <abstract dataset name>
```
- Generating supervised datasets with an LLM to train reward functions
```
python source/reward/building_goal_detection_dataset.py --name <abstract dataset name> --size 5000
```

- Train lightweight neural network as reward functions
```
python source/reward/train_reward_goal_model.py --name <abstract dataset name> 
```

**STEP 2 :** 
- Solving the abstract MDPs
```
python source/policy/build_parallele_Q_policy.py --name <abstract dataset name> 
```
- Generating the supervised fine-tuning dataset $D^{SFT}$ 
```
python data/generate_sequence.py --name <abstract dataset name> 
```

**STEP 3 :**
- Fine-tuning an LLM with $D^{SFT}$ (The parameter are designed for a cluster of 4 A100 with 80GB of VRAM).
```
accelerate launch --mixed_precision bf16 source/fine_tuning/fine_tuning_SFT.py --name <abstract dataset name>
python source/fine_tuning/merged_model.py --peft_model SFT-<abstract dataset name> --merged_model_name <merged model name>
```


**Testing**

The goal conditionned policy generated by the TEDUO pipeline can be tested by running the following command : 
```
python source/evaluation/online_evaluation.py --env BabyAI-BossCustomLevel-v0 --n_seed 10 --alternative_goal True --model_name <merged model name>
```

The results are being saved in results/online_evaluation .








