Self-supervised Attribute-aware Dynamic Preference Ranking Alignment

Self-supervised Attribute-aware Dynamic Preference Ranking Alignment

ACL ARR 2025 February Submission1832 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reinforcement Learning from Human Feedback and its variants have demonstrated remarkable performance in aligning with human values and intentions to generate helpful, harmless, and honest responses. However, most of them rely on costly human-annotated pairwise comparisons for supervised alignment, which is not suitable for list-level scenarios, such as community question answering. Additionally, human preferences are influenced by multiple intrinsic factors in responses, leading to decision-making inconsistencies. Therefore, we propose **Self-supervised Attribute-aware dynamic preference ranking**, called *SeAdpra*. It quantifies preference differences between responses based on Attribute-Perceptual Distance Factors (APDF) and dynamically determines the list-wise alignment order. Furthermore, it achieves fine-grained preference difference learning and enables precise alignment with the optimal one. We specifically constructed a challenging code preference dataset named **StaCoCoQA**, and introduced more cost-effective and scalable preference evaluation metrics: **PrefHit** and **PrefRecall**. Extensive experimental results show that *shortname* exhibits superior performance and generalizability on both StaCoCoQA and preference datasets from eight popular domains.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Preference Alignment，Community Question and Answering，Large Language Model ,Reinforcement Learning from Human Feedback

Contribution Types: NLP engineering experiment, Reproduction study, Data resources

Languages Studied: English

Submission Number: 1832

Loading