Abstract: Utilizing large language models (LLMs) for code generation has greatly enhanced software development. However, despite the advances in automated code generation frameworks that incorporate self-repair strategies, successful outcomes are not always guaranteed. This led us to explore a different approach to improving code quality. So we implemented a brainstorm-select-repair framework. For a natural language query, our framework first generates multiple foundational code snippets. Then these snippets are tested on test cases, and a text-similarity-based algorithm is used to identify the most accurate code. If none of the generated code snippets successfully fulfills the test cases, our proposed self-repair strategy is used to rectify any flawed code snippets until a code snippet that can pass the test case is identified and then provided to the user. Based on ChatGPT3, our framework reached 89% Pass@1 on HumanEval dataset (at least 3% higher than ChatGPT-4 the best model to date on this dataset). Our framework also surpasses state-of-the-art methods and delivers superior performance on the HumanEval-ET, MBPP, and MBPP-ET datasets. To further validate our design rationale, we conducted comprehensive experiments and analyzed the impact of each component.
External IDs:dblp:conf/ats/JinS24
Loading