CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

ACL ARR 2025 July Submission599 Authors

28 Jul 2025 (modified: 01 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have made significant progress in natural language understanding and generation, driven by scalable pretraining and advanced finetuning. However, enhancing reasoning abilities in LLMs, particularly via reinforcement learning from human feedback (RLHF), remains challenging due to the scarcity of high-quality preference data, which is labor-intensive to annotate and crucial for reward model (RM) finetuning. To alleviate this issue, we introduce CodePMP, a scalable preference model pretraining (PMP) pipeline that utilizes a large corpus of synthesized code-preference pairs from publicly available high-quality source code. CodePMP improves RM finetuning efficiency by pretraining preference models on large-scale synthesized code-preference pairs. We evaluate CodePMP on mathematical reasoning tasks (GSM8K, MATH) and logical reasoning tasks (ReClor, LogiQA2.0), consistently showing significant improvements in reasoning performance of LLMs and highlighting the importance of scalable preference model pretraining for efficient reward modeling.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: data-efficient training
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Previous URL: https://openreview.net/forum?id=92Z1W0kjaf
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)
Justification For Not Keeping Action Editor Or Reviewers: I want the same area chair and the same set of reviewers.
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Ethics Statement section, Page 9
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 4.1 Experimental Settings
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Ethics Statement section, Page 9
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: 4.1 Experimental Settings
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: Ethics Statement section, Page 9
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 4.1 Experimental Settings
B6 Statistics For Data: Yes
B6 Elaboration: 4.1.1 CodePMP Settings, Appendix C RM Finetuning Dataset
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Appendix A Hyperparameters and Computational Cost
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Appendix A Hyperparameters and Computational Cost
C3 Descriptive Statistics: Yes
C3 Elaboration: 4.2 Experimental Results
C4 Parameters For Packages: Yes
C4 Elaboration: Appendix E Comprehensive Data Diversity Analysis
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D1 Elaboration: We did not use human subjects.
D2 Recruitment And Payment: N/A
D2 Elaboration: We did not use human subjects.
D3 Data Consent: N/A
D3 Elaboration: We did not use human subjects.
D4 Ethics Review Board Approval: N/A
D4 Elaboration: We did not use human subjects.
D5 Characteristics Of Annotators: N/A
D5 Elaboration: We did not use human subjects.
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: N/A
E1 Elaboration: We did not use AI assistants.
Author Submission Checklist: yes
Submission Number: 599
Loading