IotaCode: A Small Code Model Can Be Reinforced to Beat the Bigger One

ACL ARR 2024 June Submission2961 Authors

15 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) are one of the most rapidly developing areas of research in machine learning. To fine-tune LLMs to better align with user requests and values, reinforcement learning techniques based on human feedback (RLHF) have been developed, allowing for the inclusion of negative as well as positive examples. An important domain for the application of large language models is the analysis and generation of source code. In this study, we investigated how modern RLHF algorithms can be applied to code generation using the CodeContests problem set. The best results were achieved using the Proximal Policy Optimization algorithm, which significantly improves the supervised fine-tuning baseline, producing IotaCode model with 1.3 billion parameters that surpass the performance of the AlphaCode model with 9 billion parameters.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: code generation and understanding, reinforcement learning, fine-tuning
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English, C++
Submission Number: 2961
Loading