Keywords: Large Language Models, Preference Optimization
Abstract: LLM-based recommendation systems have been widely explored due to their extensive world knowledge and powerful reasoning capabilities. However, current approaches fail to fully leverage preference data to optimize for the task, which impedes the performance of LLM-based recommendations. Although Direct Preference Optimization (DPO) has achieved significant success in aligning LLMs with human preferences, its mechanism of treating all rejected items as a homogeneous group fails to effectively capture the users' diverse preferences, resulting in poor performance on fine-grained preference discrimination. Our empirical analysis reveals that nearly half of prediction errors stem from the model's inability to accurately distinguish between chosen items and high-preference rejected items with subtle differences. To address this challenge, we propose an expert-guided adaptive preference optimization (EAPO) framework that pre-trains a lightweight recommendation model as an expert to assign personalized weights to preference sample pairs. Based on theoretical analysis, we design an adaptive $\beta$ strategy: applying smaller $\beta$ values to item pairs with similar preference levels to amplify reward differences, while using larger $\beta$ values for item pairs with significant preference disparities to ensure learning stability.
Experimental results demonstrate that EAPO not only achieves superior performance in multiple benchmark datasets, but also demonstrates plug-and-play compatibility with a variety of existing preference optimization methods, establishing a new and scalable paradigm in this field.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 3984
Loading