Abstract: Recently, RL-based large language models have demonstrated significant promise for code generation, but current approaches are typically constrained by limited, especially domain-specific, data and by simplistic reward designs that do not adequately capture complex semantic relationships. We present CodePO, which extends GRPO with a lightweight, rule-based composite reward framework. CodePO introduces enhanced reward rules for richer code similarity evaluation. Additionally, CodePO optimizes the computation of the Advantage function, ensuring more accurate and stable policy updates during training. Experiments on both domain-specific and general datasets like TACO demonstrate that CodePO significantly improves code generation accuracy and quality. Ablation studies confirm the benefits of composite rewards and adaptive tuning, highlighting CodePO’s effectiveness for real-world programming tasks.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Efficient/Low-Resource Methods for NLP,Generation,Generation,Language Modeling,Machine Learning for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory
Languages Studied: English
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: No
A2 Elaboration: No. Our approach only involves publicly available, anonymized datasets and does not present risks of misuse or harm.
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Last section Reference
B2 Discuss The License For Artifacts: No
B2 Elaboration: No. In our paper, we did not discuss the licenses for the artifacts used. We only used publicly available datasets and open source code for which usage is standard in the community.
B3 Artifact Use Consistent With Intended Use: No
B3 Elaboration: No. We did not explicitly discuss compliance with intended use in the paper. All artifacts used are standard datasets and open source code employed strictly for academic research purposes.
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: No. We did not discuss these aspects because we only used publicly available datasets that do not contain personally identifiable information or offensive content.
B5 Documentation Of Artifacts: No
B5 Elaboration: in section 4.1
B6 Statistics For Data: Yes
B6 Elaboration: in section 4.1
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: in section 4.2
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: in section 4.2
C3 Descriptive Statistics: Yes
C3 Elaboration: in section 4
C4 Parameters For Packages: No
C4 Elaboration: No. We did not explicitly report the parameters or implementation details for the packages used; default settings were applied throughout.
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: No
D1 Elaboration: No. We did not include the full instructions or screenshots in the paper due to space limitations.
D2 Recruitment And Payment: No
D2 Elaboration: No. Our participants were unpaid volunteers, and we did not include further details in the paper.
D3 Data Consent: No
D3 Elaboration: No. We did not discuss data consent as we only used publicly available datasets that were previously collected and released with appropriate consent.
D4 Ethics Review Board Approval: No
D4 Elaboration: No. This work uses data that is publicly available and does not involve any new data collection from human participants.
D5 Characteristics Of Annotators: No
D5 Elaboration: No. Our annotators were recruited via crowdsourcing platforms, and we did not collect or report their demographic data.
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: No. AI assistants were used solely for translation purposes and not for research, coding, or substantive writing, so their use was not specifically mentioned in the paper.
Author Submission Checklist: yes
Submission Number: 152
Loading