Keywords: code generation training, repo-level code generation
TL;DR: Automatically constructing repo-level execution-based environments for training and evaluation.
Abstract: We introduce RepoST, a scalable method to build repository-level code generation environments that provide execution feedback for model training. Unlike existing works that require building the entire repository for execution, which is challenging for both human and LLMs and limits the scalability of the datasets, we leverage sandbox testing, which isolates the target function and its dependencies to a separate script for testing. In inference, models can still access the natural repository for code generation, and the script will be used to provide execution feedback. We use our method to construct RepoST-Train, a large-scale train set with 7,415 functions from 824 repositories. Training with the execution feedback provided by RepoST-Train leads to a performance gain of 5.5% Pass@1 on HumanEval and 3.5% Pass@1 on RepoEval.
Submission Number: 8
Loading