# Test-Time Curricula for Targeted RL

The script `training/verl_training.sh` is the entropy point for the RL training.

The `data` directory includes code to reproduce the construction of our verifiable-corpus for RL training.

## Modified external submodule

Our implementation builds upon a publicly available codebase. For the convenience of reviewers and to ensure reproducibility, we have included the necessary code from these projects as submodules within this submission.

- **`TTRL`**: This is a modified version of the code from Zuo et al. (2025).

```bibtex
@article{zuo2025ttrl,
	title        = {TTRL: Test-Time Reinforcement Learning},
	author       = {Zuo, Yuxin and Zhang, Kaiyan and Qu, Shang and Sheng, Li and Zhu, Xuekai and Qi, Biqing and Sun, Youbang and Cui, Ganqu and Ding, Ning and Zhou, Bowen},
	year         = 2025,
	journal      = {arXiv preprint arXiv:2504.16084}
}
```
