<p align="center">
  <img src="asset/codegym_logo_v2.svg" alt="CodeGym Logo" width="500"/>
</p>

# Generalizable End-to-End Tool-Use RL with Synthetic CodeGym



CodeGym is a synthetic environment generation framework for LLM agent reinforcement learning on tool-use tasks. It automatically converts static code problems into interactive CodeGym environments where agents can learn to use tools to solve complex tasks in various configurations.

## Overview

<p align="center">
  <img src="asset/teaser.png" alt="CodeGym Logo" width="1440"/>
</p>

CodeGym transforms traditional code problems into interactive environments where LLM agents can learn to:
- Use tools and actions to solve problems step-by-step
- Learn generalizable tool-use behaviors

## Environment Synthesis Process

<p align="center">
  <img src="asset/synthesis.png" alt="CodeGym Logo" width="1440"/>
</p>

We designed an elaborate process for CodeGym environment synthesis and verification:

**Gym Synthesis:**
- Extract reusable code logic and functions from programming solutions
- Convert them into a library of documented tools and utilities
- Generate OpenAI Gym format environments with state, actions, transitions, and rewards

**Gym Verification:**
- Generate comprehensive unit tests spanning multiple difficulty levels
- Validate environment correctness (no compilation errors, timeouts, or memory issues)
- Verify solvability by generating solution functions that successfully use the provided tools

## Key Result

By training in CodeGym, LLMs show stronger generalization on out-of-distribution (OOD) benchmarks:

<p align="center">
  <img src="asset/key_result.png" alt="CodeGym Logo" width="1440"/>
</p>

## CodeGym Example

Example CodeGym environments are listed in `examples/envs`, and example training instance are listed in `training_instance_example.jsonl`. These environments and instances are randomly sampled from the whole CodeGym dataset.

## CodeGym Generation Pipeline

We provide an example coding problem dataset (`example/raw_problems.jsonl`) and here is the step-by-step CodeGym generation process:

**Step 1:** Run `gym/0_gym_gen.py` to generate prompts for environment synthesis. Note that if `--online-inference` is enabled, it will automatically call the OpenAI API to generate the environments, otherwise you should inference the prompts with your own LLMs.

**Step 2:** Process the generated environments by running `gym/0_gym_lease.py`. This will generate an environment jsonl file containing environments that have passed the compilation check.

**Step 3:** Run `gym/1_unit_test_gen.py` to generate unit tests for the environments. This will create comprehensive test cases to validate environment functionality and correctness. Similarly, if `--online-inference` is enabled, it will automatically call the OpenAI API to generate the unit tests. After inference the prompt, run `gym/1_unit_test_lease.py` to extract format correct unit test inputs.

**Step 4:** Run `gym/2_solve_fc_gen.py` to generate solution functions for the environments. This will create reference solutions that demonstrate how to use the provided tools to solve each environment. Similarly, if `--online-inference` is enabled, it will automatically call the OpenAI API to generate the solution functions. After inference the prompt, run `gym/2_solve_fc_lease.py` to extract format correct solution functions.

**Step 5:** Run `gym/3_solve_fc_with_unit_test.py` to check whether exists a solve fc pass all unit tests. If exists, the environment is valid.

**Step 6:** Run `gym/4_extract_env_to_server.py` to produce all verfied environments and corresponding task configurations.

> **Note:** The default prompts are in English. To use Chinese prompts, please switch to the Chinese version of the prompt files by using the corresponding `prompt_cn` version.