OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Go to
DBLP
homepage
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers.
Rihui Xin
,
Han Liu
,
Zecheng Wang
,
Yupeng Zhang
,
Dianbo Sui
,
Xiaolin Hu 0001
,
Bingning Wang
21 Jan 2026
CoRR 2025
Everyone
CC BY-SA 4.0
Loading