EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Chengjun Pan, Shichun Liu, Jiahang Lin, Dingwei Zhu, Jiazheng Zhang, Shihan Dou, Songyang Gao, Zhenhua Han, Binghai Wang, Rui Zheng, Xuanjing Huang, Tao Gui, Yansong Feng

Published: 2026, Last Modified: 25 May 2026CoRR 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading