Uncertainty-penalized reinforcement learning from human feedback with diversified reward LoRA ensembles

Yuanzhao Zhai, Yu Lei, Han Zhang, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang

Published: 01 Apr 2026, Last Modified: 23 Jan 2026Information Processing & ManagementEveryoneRevisionsCC BY-SA 4.0
Loading