Keywords: LLM, reinforcement learning, environment setup, machine learning for software engineering
TL;DR: We've applied online reinforcement learning with verifiable rewards to train Qwen3-8B for enviornment setup task, surpassed GPT-4o-mini and comprable with 32B model on EnvBench.
Abstract: Environment setup—the process of configuring the system to work with a specific
software project—represents a persistent challenge in Software Engineering (SE).
Automated environment setup methods could assist developers by providing fully
configured environments for arbitrary repositories without manual effort. This
also helps SE researchers to scale execution-based benchmarks. However, recent
studies reveal that even state-of-the-art Large Language Models (LLMs) achieve
limited success on automating this task. To address this limitation, we employ
an online Reinforcement Learning with Verifiable Rewards approach to improve
the environment setup capabilities of LLMs. As outcome-based rewards for en-
vironment setup require containerisation of each sample and are computationally
expensive, we leverage lightweight proxy rewards. On EnvBench-Python, our
method enables Qwen3-8B (a model runnable on consumer hardware) to set up
15.8 out of 329 repositories on average over five runs. This is a +690% gain over
the base model and +58% over GPT-4o-mini at comparable cost. Our replication
package with training code and trained model checkpoints is available online:
https://github.com/envsetup-rl-dl4c/envsetup-rl.
Submission Number: 47
Loading