Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution

Jiawei Liu; Thanh V Nguyen; Mingyue Shang; Hantian Ding; Xiaopeng Li; Yu Yu; Varun Kumar; Zijian Wang

Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution

Jiawei Liu, Thanh V Nguyen, Mingyue Shang, Hantian Ding, Xiaopeng Li, Yu Yu, Varun Kumar, Zijian Wang

Published: 06 Mar 2025, Last Modified: 06 Mar 2025DL4C @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 9 pages)

Keywords: Code Generation, Large Language Model, Code Preference

Abstract:

Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation from verifiable properties and aligning it with developer preferences remains a challenge. In this paper, we explore two key questions under the new challenge of code preference learning: \textit{(i)} How to train models to predict meaningful preferences for code; and \textit{(ii)} how do code preferences based on verifiers, human, and neural models align with each other? To this end, we introduce \textsc{CodeFavor}, an open recipe to train pairwise code preference models using synthetic code evolution, including code commits and code critiques. We evaluate code preferences via \textsc{CodePrefBench}, a new benchmark with 1364 rigorously curated code preference tasks to cover three verifiable properties: correctness, efficiency, and security, along with human preference. Our evaluation shows that \textsc{CodeFavor} holistically improves model-based code preferences by up to $28.8%$. Our comprehensive controlled experiments also validate the design choices in \textsc{CodeFavor}. Furthermore, we quantified the cost and limitations of human-based code preference: \textit{(i)} Despite spending 23 person-minutes per task, $15\sim 40%$ of tasks remain unsolved; and \textit{(ii)} human preference is the most accurate on code correctness while underperforming model-based preferences on non-functional objectives.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: Zijian Wang

Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.

Submission Number: 54

Loading