Due to the supplementary materials being close to 100GB, we place all the data, code and checkpoints (URSA-8B, URSA-8B-RM, URSA-8B-PS-GRPO) in Google Drive link [https://drive.google.com/drive/folders/1vPOCIexowBqhbsCB6vUXr7EkNT9xNkP_?usp=sharing](https://drive.google.com/drive/folders/1vPOCIexowBqhbsCB6vUXr7EkNT9xNkP_?usp=sharing) with **anonymous sharer ID**.


