# 📜 Goal-Guided Efficient Exploration in Reinforcement Learning via Large Language Model

This repository contains the official implementation of **SGRL (Structured Goal-guided Reinforcement Learning)**, a novel framework that integrates LLMs to improve RL agent's exploration capabilities in open-world environments.
<div align="left">
  <img src="image/framework.png" width="100%" alt="SGRL Framework"/>
  <p><em>Figure: (a) Structured Goal Planner: generates prioritized goals from environment states using distilled task knowledge;
	(b) Goal-Conditioned Action Pruner: filters invalid or irrelevant actions based on current goals;
	(c) The overall framework of SGRL.</em></p>
</div>

> 🔔 **To reproduce the results in the paper, please follow the instructions below.**

---

## 📦 Installation & Setup

### 1. Create Conda Environment

```bash
conda create -n SGRL python=3.10
conda activate SGRL
pip install --upgrade pip
pip install -r requirements.txt
```



### 2. Configure LLM Access

Edit the LLM configuration in `Craftax/lm.py` to connect to your LLM backend (e.g., OpenAI, local API, or custom server):

```python
URL = ""        # API endpoint URL 
API_KEY = ""    # Your API key
MODEL_NAME = ""  # or your desired LLM
```
---

## 3. Download Embedding Model

SGRL leverages the `paraphrase-MiniLM-L6-v2` model to **encode semantic goals into dense vector representations (embeddings)**. These goal embeddings are concatenated with the environment state to form an augmented input for the policy network, enabling goal-conditioned decision making.

Please download the model before training: [paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2)

## 🚀 Training the Agent

### 🔧 Key Configuration Flags

| Flag | Description |
|------|-------------|
| `--use_goal` | Enable goal-guided exploration |
| `--use_wandb` | Log metrics to [Weights & Biases](https://wandb.ai) |
| `--goal_type SGRL` | Use SGRL with priority-based goal selection |
| `--goal_type SGRL_nopri` | Use SGRL without priority |
| `--epsilon 1 --min_epsilon 1e-6` | Enable dynamic action pruner |
| `--epsilon 1e-6 --min_epsilon 1e-6` | Use static (fixed) pruner |
| `--epsilon 1 --min_epsilon 1` | Disable pruner entirely |
| `--epsilon_type linearann` | Linear annealing for $\epsilon$ |
| `--epsilon_type expann` | Exponential annealing |
| `--epsilon_type 3stagelinear` | Three-stage linear annealing |
| `--epsilon_type 3stagecos` | Three-stage cosine annealing |

### 🏃‍♂️ Example: Train SGRL with 3-Stage Cosine Annealing

```bash
python train.py \
  --seed 1000 \
  --use_goal \
  --use_wandb \
  --goal_type SGRL \
  --alg_name SGRL_3-Stage_Cos \
  --epsilon_type 3stagecos \
  --clip 0.2 \
  --total_timesteps 10000000
```

For full reproducibility, **all training commands** (including ablations and baselines) are centralized in the script `run.sh`. This includes configurations for different annealing strategies, pruner settings, and goal types.

To run all experiments:
```bash
bash run.sh
```

## 📚 Citation

Our code builds upon the Craftax code with improvements. Thanks to them! If you find this repo useful, please feel free to cite:

```
@inproceedings{matthews2024craftax,
    author = {Michael Matthews and Michael Beukman and Benjamin Ellis and Mikayel Samvelyan and Matthew Jackson and Samuel Coward and Jakob Foerster},
    title  = {Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning},
    booktitle = {International Conference on Machine Learning ({ICML})},
    year = {2024}
}
```
