Keywords: large language model, binary preference alignment, squeezing effect
Abstract: In contrast to pairwise optimization methods, binary preference alignment algorithms do not process sample pairs jointly, which may lead to suboptimal probability distributions. Under the influence of the squeezing effect, probability mass flowing out of negative samples may diffuse into neutral regions, while the mass absorbed by positive samples might originate from such neutral areas, resulting in insufficient penalty for negative responses. To address this issue, we propose the PFO (Probability Flow Optimization) algorithm. The algorithm dynamically evaluates the probability of sample generation and systematically optimizes the transfer of probability mass by reweighting samples to encourage flow from negative to positive distributions. Comprehensive experiments and analysis on the general-purpose benchmarks MT-Bench and AlpacaEval 2 demonstrate the algorithm's effectiveness. Furthermore, experiments on recommendation domain datasets show that the method effectively applies to sparse feedback scenarios, confirming the algorithm's broad applicability. Our work offers new insights into improving binary preference alignment from the perspective of probabilistic flow.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 8642
Loading