Abstract: Direct Preference Optimization (DPO) has recently expanded its successful application beyond aligning large language models (LLMs), further targeting the alignment of text-to-image models with human preferences. However, traditional DPO approach would inadvertently result in a simultaneous reduction of sampling probabilities for preferred and dispreferred items during the alignment process, thereby potentially diminishing model's generative capacity. In this paper, we firstly undertake a revisit of DPO by grounding our analysis in the framework of contrastive loss. It reveals that DPO only emphasizes the part quantifying dissimilarity between items, while overlooking aspects pertinent to positive items. Hence, we propose the Positive Enhanced Preference Alignment (PEPA). Three enhancement strategies are introduced herein, and after comprehensive empirical evaluation, we recommend implementation of enhancing the log probability of preferred ratio in practice applications, which is distinguished by both stability and effectiveness. Experimental assessments are carried out on the HPS-V2 test set, with results demonstrating that PEPA outperforms or matches current state-of-the-art alignment techniques, thus highlighting PEPA's exceptional practical efficacy.
Loading