Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution

Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution

ACL ARR 2025 May Submission7327 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation from verifiable properties and aligning it with developer preferences remains a challenge. In this paper, we explore two key questions under the new challenge of code preference learning: (i) How to train models to predict meaningful preferences for code; and (ii) how do code preferences based on verifiers, human, and neural models align with each other? To this end, we introduce **CodeFavor**, an open recipe to train pairwise code preference models using synthetic code evolution, including code commits and code critiques. We evaluate code preferences via **CodePrefBench**, a new benchmark with 1364 rigorously curated code preference tasks to cover three verifiable properties: correctness, efficiency, and security, along with human preference. Our evaluation shows that CodeFavor holistically improves model-based code preferences by up to $28.8%$. Our comprehensive controlled experiments also validate the design choices in CodeFavor. Furthermore, we quantified the cost and limitations of human-based code preference: (i) Despite spending 23 person-minutes per task, $15\sim 40%$ of tasks remain unsolved; and (ii) human preference is the most accurate on code correctness while underperforming model-based preferences on non-functional objectives.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: Code Generation, Code Language Models, Code Preference, Preference Learning, Machine Learning for Code

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English, Python

Submission Number: 7327

Loading