## Dependency
Megatron-LM
``` bash
git clone -b core_v0.12.1 https://github.com/NVIDIA/Megatron-LM.git
```

Wechat-YATT Infra
``` bash
git clone https://github.com/Tencent/Wechat-YATT.git
```

## Datasets
DeepscaleR
``` bash 
https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset
```

Benchmarks
``` bash
https://github.com/QwenLM/Qwen2.5-Math
```

## Our code
1. Place our files into `Wechat-YATT` at `tasks/math_rl_v3/qwen/D3S`

2. Configure path of model and data, pre-process model and data according to specification of `Wechat-YATT`
    1. Data need to be indexed by `tools/data_convert/build_dataset_v3_meta.py`
    2. Model need to be converted, see `https://github.com/Tencent/Wechat-YATT/blob/public/docs/source/math_grpo.md`

3. Run 
```bash
bash tasks/math_rl_v3/qwen/D3S/mpirun-grpo.sh > <log_file> 2>&1
```