Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution

Published: 06 Mar 2025, Last Modified: 06 Mar 2025DL4C @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 9 pages)
Keywords: Code Generation, Large Language Model, Code Preference
Abstract:

Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation from verifiable properties and aligning it with developer preferences remains a challenge. In this paper, we explore two key questions under the new challenge of code preference learning: \textit{(i)} How to train models to predict meaningful preferences for code; and \textit{(ii)} how do code preferences based on verifiers, human, and neural models align with each other? To this end, we introduce \textsc{CodeFavor}, an open recipe to train pairwise code preference models using synthetic code evolution, including code commits and code critiques. We evaluate code preferences via \textsc{CodePrefBench}, a new benchmark with 1364 rigorously curated code preference tasks to cover three verifiable properties: correctness, efficiency, and security, along with human preference. Our evaluation shows that \textsc{CodeFavor} holistically improves model-based code preferences by up to $28.8%$. Our comprehensive controlled experiments also validate the design choices in \textsc{CodeFavor}. Furthermore, we quantified the cost and limitations of human-based code preference: \textit{(i)} Despite spending 23 person-minutes per task, $15\sim 40%$ of tasks remain unsolved; and \textit{(ii)} human preference is the most accurate on code correctness while underperforming model-based preferences on non-functional objectives.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: Zijian Wang
Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 54
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview