Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
Juntao Dai
,
Taiye Chen
,
Yaodong Yang
,
Qian Zheng
,
Gang Pan
Published: 01 Jan 2025, Last Modified: 01 Aug 2025
ICLR 2025
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading