When is RL better than DPO in RLHF? A Representation and Optimization Perspective

Published: 01 Jan 2024, Last Modified: 19 Jan 2025Tiny Papers @ ICLR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading