CodePO: A Rule-Enhanced Code-Based  Policy Optimization

CodePO: A Rule-Enhanced Code-Based Policy Optimization

ACL ARR 2025 July Submission152 Authors

24 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, RL-based large language models have demonstrated significant promise for code generation, but current approaches are typically constrained by limited, especially domain-specific, data and by simplistic reward designs that do not adequately capture complex semantic relationships. We present CodePO, which extends GRPO with a lightweight, rule-based composite reward framework. CodePO introduces enhanced reward rules for richer code similarity evaluation. Additionally, CodePO optimizes the computation of the Advantage function, ensuring more accurate and stable policy updates during training. Experiments on both domain-specific and general datasets like TACO demonstrate that CodePO significantly improves code generation accuracy and quality. Ablation studies confirm the benefits of composite rewards and adaptive tuning, highlighting CodePO’s effectiveness for real-world programming tasks.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: Efficient/Low-Resource Methods for NLP,Generation,Generation,Language Modeling,Machine Learning for NLP

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory

Languages Studied: English

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: No

A2 Elaboration: No. Our approach only involves publicly available, anonymized datasets and does not present risks of misuse or harm.

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Last section Reference

B2 Discuss The License For Artifacts: No

B2 Elaboration: No. In our paper, we did not discuss the licenses for the artifacts used. We only used publicly available datasets and open source code for which usage is standard in the community.

B3 Artifact Use Consistent With Intended Use: No

B3 Elaboration: No. We did not explicitly discuss compliance with intended use in the paper. All artifacts used are standard datasets and open source code employed strictly for academic research purposes.

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: No. We did not discuss these aspects because we only used publicly available datasets that do not contain personally identifiable information or offensive content.

B5 Documentation Of Artifacts: No

B5 Elaboration: in section 4.1

B6 Statistics For Data: Yes

B6 Elaboration: in section 4.1

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: in section 4.2

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: in section 4.2

C3 Descriptive Statistics: Yes

C3 Elaboration: in section 4

C4 Parameters For Packages: No

C4 Elaboration: No. We did not explicitly report the parameters or implementation details for the packages used; default settings were applied throughout.

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: No

D1 Elaboration: No. We did not include the full instructions or screenshots in the paper due to space limitations.

D2 Recruitment And Payment: No

D2 Elaboration: No. Our participants were unpaid volunteers, and we did not include further details in the paper.

D3 Data Consent: No

D3 Elaboration: No. We did not discuss data consent as we only used publicly available datasets that were previously collected and released with appropriate consent.

D4 Ethics Review Board Approval: No

D4 Elaboration: No. This work uses data that is publicly available and does not involve any new data collection from human participants.

D5 Characteristics Of Annotators: No

D5 Elaboration: No. Our annotators were recruited via crowdsourcing platforms, and we did not collect or report their demographic data.

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: No

E1 Elaboration: No. AI assistants were used solely for translation purposes and not for research, coding, or substantive writing, so their use was not specifically mentioned in the paper.

Author Submission Checklist: yes

Submission Number: 152

Loading