Larger or Smaller Reward Margins to Select Preferences for LLM Alignment?

Kexin Huang, Junkang Wu, Ziqian Chen, Xue Wang 0010, Jinyang Gao, Bolin Ding, Jiancan Wu, Xiangnan He 0001, Xiang Wang 0010

15 Jan 2026 (modified: 22 Jan 2026)ICML 2025EveryoneRevisionsCC BY-SA 4.0
Loading