# Posterior-GRPO: Rewarding Reasoning Processes in Code Generation

<p align="center">
  <!-- <a href="">[📖paper]</a> &nbsp;&nbsp; -->
  <a href="https://huggingface.co/fff/Coder-P-GRPO-7B">[🤗Code Model With P-GRPO]</a> &nbsp;&nbsp;
  <a href="https://huggingface.co/fff/Math-P-GRPO-7B">[🤗Math Model With P-GRPO]</a> &nbsp;&nbsp;
</p>
<p align="center">
  <a href="https://huggingface.co/fff/OD-Based-Reward-7B">[🤗Thinking Reward Model 7B]</a> &nbsp;&nbsp;
  <a href="https://huggingface.co/fff/OD-Based-Reward-3B">[🤗Thinking Reward Model 3B]</a>
</p>
<p align="center">
<a href="https://huggingface.co/datasets/fff/Reasoning-Reward">[🤗 Dataset for Training Thinking Reward Model]</a> 
</p>
<p align="center"> <a href="https://huggingface.co/datasets/fff/LCB-RB">[🤗 Benchmark for Reasoning Processes]</a>
</p>


## Supplementary Materials Files

1. `ReasoningRL.zip` : Related Code
3. `PGRPO_GRPO_Compare.json`:  a collection of qualitative case studies, including P-GRPO and GRPO, illustrating the improvements in reasoning quality.

