RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Yiqing Xie; Alex Xie; Divyanshu Sheth; Pengfei Liu; Daniel Fried; Carolyn Rose

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

Published: 09 Jun 2025, Last Modified: 14 Jul 2025CODEML@ICML25EveryoneRevisionsBibTeXCC BY 4.0

Keywords: code generation training, repo-level code generation

TL;DR: Automatically constructing repo-level execution-based environments for training and evaluation.

Abstract: We introduce RepoST, a scalable method to build repository-level code generation environments that provide execution feedback for model training. Unlike existing works that require building the entire repository for execution, which is challenging for both human and LLMs and limits the scalability of the datasets, we leverage sandbox testing, which isolates the target function and its dependencies to a separate script for testing. In inference, models can still access the natural repository for code generation, and the script will be used to provide execution feedback. We use our method to construct RepoST-Train, a large-scale train set with 7,415 functions from 824 repositories. Training with the execution feedback provided by RepoST-Train leads to a performance gain of 5.5% Pass@1 on HumanEval and 3.5% Pass@1 on RepoEval.

Submission Number: 8

Loading