Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization.

Junkang Wu, Yuexiang Xie, Zhengyi Yang 0007, Jiancan Wu, Jiawei Chen 0007, Jinyang Gao, Bolin Ding, Xiang Wang 0010, Xiangnan He 0001

10 Jan 2026ICLR 2025EveryoneCC BY-SA 4.0
Loading