<div align="center">
    <h1>Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning</h1>
</div>


## Environment Setup
To setup the environment, run:
```bash
pip install -r requirements.txt
```


## Reproduce Results
We provide all scripts in the `reproduce` folder to make it one command to reproduce our results.

For example:
```bash
bash reproduce/b1_wd1_countdown.sh
```


## Evaluation
We also provide output files in the `results` folder which you can directly run to evaluate:
```bash
python3 -m b1.eval.parse_and_get_acc --directory "results"
```


## Acknowledgement
Our GRPO implementation is adapted from the d1 and wd1 repositories. We also follow their reasoning dataset choice. We thank the authors for making their codebases clear.

@article{wd1,
  title   = {wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models},
  author  = {Tang, Xiaohang and Dolga, Rares and Yoon, Sangwoong and Bogunovic, Ilija},
  journal = {arXiv preprint arXiv:2507.08838},
  year    = {2025}
}

@article{d1,
  title   = {{d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning}},
  author  = {Zhao, Siyan and Gupta, Devaansh and Zheng, Qinqing and Grover, Aditya},
  journal = {arXiv preprint arXiv:2504.12216},
  year    = {2025}
}
