# ThinkTime RL Training
This repo includes the source code for DAPO training of ThinkTime-14B with the implementation of RL for iTCoT.

## Installation
1. Run `pip3 install -e ./`

## Steps to Reproduce
1. Make sure that you have follow the previous steps in [ThinkTime Folder](../ThinkTime/README.md). All the datasets are successfully generated and the Warm-Up SFT model is already trained.
2. Set the dataset and model path entries in `recipes/ThinkTime/grpo/config_demo.yaml`
3. Run `bash scripts/train_ts_14b.sh`

## Reference
This code is built on open-r1 (https://github.com/huggingface/open-r1) and trl (https://github.com/huggingface/trl). We will comply with the relevant license requirements and open-source the code after acceptance of this paper.
