CoRLHF: Reinforcement learning from human feedback with cooperative policy-reward optimization for LLMs

Qi Liu, Zhuoyang Song, Yuxin Liang, Zejian Xie, Songxin Zhang, Jiaxing Zhang, Yanjie Li

Published: 2026, Last Modified: 01 Mar 2026Expert Syst. Appl. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading