# DAPO: An Open-Source LLM Reinforcement Learning System at Scale

This repository is the official implementation of **DAPO: An Open-Source LLM Reinforcement Learning System at Scale**. 

## Key Results

### AIME 2024 Performance

🚀 **DAPO** achieves 50 points on AIME 2024 based on the Qwen2.5-32B base model, outperforming the previous SoTA DeepSeek-R1-Zero-Qwen-32B with 50% training steps.

![alt text](img/score.png)

### Metric Supervision during Training

1. **Length stability and growth**: The steady increase in response length allows for greater exploration, facilitating the model’s ability to learn more complex reasoning behaviors, ultimately contributing to training stability and performance improvement.

2. **Reward score stability**: A stable increase in the reward signal indicates that the model is successfully fitting the training distribution, ensuring that the learning process remains robust and consistent without significant fluctuations.

3. **Entropy and mean probability trend**: A controlled increase in entropy, after an initial decrease, ensures a healthy balance between exploration and exploitation, avoiding issues such as overfitting or excessive randomness, and promoting sustained model performance.

![alt text](img/dynamic.png)

## Requirements

We recommend using conda to setup the environment.

### Training Requirements

```bash
conda create -n dapo_train python=3.10

conda activate dapo_train

pip install --no-cache-dir "vllm==0.8.3" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" tensordict torchdata \
    "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
    "numpy<2.0.0" "pyarrow>=15.0.0" pandas \
    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb liger-kernel mathruler \
    pytest yapf py-spy pyext pre-commit ruff

wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
    pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

wget -nv https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
    pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl

pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
```

### Evaluation Requirements

```bash
conda create -n dapo_eval python=3.10
conda activate dapo_eval
pip3 install -r eval/requirements.txt
```

## Pre-trained Model

You can download the pretrained model Qwen2.5-32B from the following link:

- [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B).

## Datasets

We provide training and validation datasets for DAPO training.

Training: [DAPO-Math-17k](data/dapo-math-17k.parquet), a carefully curated and processed math dataset.

Validation: [AIME 2024](data/aime-2024.parquet).

## Training

We provide the training script for DAPO: [DAPO -- AIME 50](verl/recipe/dapo/run_dapo_qwen2.5_32b.sh).

After starting the Ray runtime, submit the job to the Ray cluster **from any machine**:

```bash
cd verl # Repo root
export RAY_DATA_HOME="your/path/to/the/DAPO/repo"
export MODEL_PATH="your/path/to/Qwen2.5-32B"
export RAY_ADDRESS="http://${RAY_IP:-localhost}:8265" # The Ray cluster address to connect to
export WORKING_DIR="${PWD}" # The local directory to package to the Ray cluster
# Set the runtime environment like env vars and pip packages for the Ray cluster in yaml
export RUNTIME_ENV="./recipe/dapo/runtime_env.yaml" # This enables VLLM_USE_V1=1
bash recipe/dapo/run_dapo_qwen2.5_32b.sh
```

## Evaluation

To evaluate the model on AIME 2024, we deploy it with Ray Serve and vLLM.

```bash
serve run eval.llm:build_app model=aaa/bbb/ccc tensor-parallel-size=8

# open another terminal
python eval/eval_aime24.py --temperature 1.0 --top_p 0.7 --max_tokens 20480 --model ccc --test_file data/aime-2024.parquet
```

