Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Published: 01 Jan 2025, Last Modified: 12 May 2025ICLR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading