KL Penalty Control via Perturbation for Direct Preference Optimization.

Sangkyu Lee, Janghoon Han, Hosung Song, Stanley Jungkyu Choi, Honglak Lee, Youngjae Yu

07 Nov 2025CoRR 2025EveryoneCC BY-SA 4.0
Loading