Keywords: LLM Alignments; Gap-Aware Preference Optimization
TL;DR: We present Gap-Aware Preference Optimization (GaPO), a novel method that refines the RLHF for LLMs by quantifying and incorporating semantic gaps between preferences.
Abstract: Reinforcement learning from human feedback (RLHF) approaches are widely used for fine-tuning large language models (LLMs) to align with instructional preferences. However, traditional RLHF methods often rely on binary labels, which fail to capture the pairwise differences in human perception, leading to potential performance degradation.
To address this limitation, we introduce $\textbf{Gap-Aware Preference Optimization}$ (GaPO), a novel approach that integrates the degree of semantic gaps into preference optimization. By modifying the margin term in the loss function and replacing it with an estimated gap computed using general metrics, GaPO provides a new supervisory signal that explicitly highlights the nuances between preference pairs. This new signal helps the model allocate gradients more rationally during optimization, facilitating more effective learning from the preference data.
Experiments conducted with a strong base model, Llama-3-8B-Instruct, demonstrate that GaPO surpasses state-of-the-art methods on widely used benchmarks. Our best-performing model, GaPO-ROUGE\_L, achieves a win rate of 52.8\% on AlpacaEval 2.0, exceeding the baseline methods by 5.3 points.
Supplementary Material: pdf
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4623
Loading