Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Tengyang Xie
,
Dylan J. Foster
,
Akshay Krishnamurthy
,
Corby Rosset
,
Ahmed Hassan Awadallah
,
Alexander Rakhlin
Published: 01 Jan 2025, Last Modified: 01 Oct 2025
ICLR 2025
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading