# PilotRL: Training Language Model Agents via Global Planning-Guided Progressive Reinforcement Learning

![license](https://img.shields.io/github/license/modelscope/modelscope.svg)

Python implementation of ***PilotRL***, a global planning-guided training framework for LLM agents driven by progressive reinforcement learning. We achieve improvements on open-sourced models including LLaMA ([LLaMA3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)) and Qwen ([Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)) series. Experiments indicate the superiority of PilotRL, with LLaMA3.1-8B-Instruct + PilotRL surpassing closed-sourced GPT-4o by 3.60%, while showing a more substantial gain of 55.78% comparing to GPT-4o-mini at a comparable parameter scale. 

The graphic below provides an overview of the PilotRL pipeline.

![Illustration of PilotRL.](figures/AgentRL_pipeline.svg)

<!-- ## Getting started -->
## Installation
To get started, please install the required packages:
```bash
pip install -r ./myverl/verl/requirements.txt
```

## Data Construction
The scripts for data construction are in `./data_construction/`.
```bash
python ./data_construction/plan_traj_generation_parallel.py
```

## Training
The reinforcement learning (RL) framework is built upon `verl` with Group Relative Policy Optimization (GRPO) as the learning algorithm. 
```bash
bash agent_stage1.sh # Stage 1
bash agent_stage2.sh # Stage 2
bash agent_stage3.sh # Stage 3
```

## Case Study

![Illustration of AgentRL_case_BabyAI_PilotRL.](figures/AgentRL_case_BabyAI_PilotRL.svg)

![Illustration of AgentRL_case_BabyAI_ReAct.](figures/AgentRL_case_BabyAI_ReAct.svg)
