Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
,
Jincheng Mei
,
Katayoon Goshvadi
,
Hanjun Dai
,
Tong Yang
,
Sherry Yang
,
Dale Schuurmans
,
Yuejie Chi
,
Bo Dai
Published: 01 Jan 2025, Last Modified: 12 May 2025
ICLR 2025
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading