Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers.

Rihui Xin, Han Liu, Zecheng Wang, Yupeng Zhang, Dianbo Sui, Xiaolin Hu 0001, Bingning Wang

21 Jan 2026CoRR 2025EveryoneCC BY-SA 4.0
Loading