## Guide: OpenEnv + SkyRL

This directory holds the workflow to train on PyTorch OpenEnv environments with SkyRL.

In this guide, we walk through how to train a reinforcement learning agent using SkyRL with [PyTorch OpenEnv](https://github.com/meta-pytorch/OpenEnv) environments. OpenEnv provides isolated execution environments for agentic RL training with Gymnasium-style APIs.


Start by following the SkyRL [installation instructions](https://skyrl.readthedocs.io/en/latest/getting-started/installation.html), then enter the `skyrl-train` directory:
```bash
cd SkyRL/skyrl-train
```

### 1) Environment Setup

Prerequisites: Ensure that you have Docker installed and the required OpenEnv environment images pulled locally.

First, install the OpenEnv environments (i.e., download the images for each environment):

```bash
# Execute from skyrl-train directory
uv run integrations/openenv/install_environment.py echo-env
# Or install all environments:
# uv run integrations/openenv/install_environment.py
```

This will pull the necessary Docker images for the OpenEnv environments.

Available environments: ``echo-env``, ``coding-env``, ``openspiel-env``, ``atari-env``, ``sumo-rl-env``, ``finrl-env``.


### 2) Dataset Preparation

For training, we use simple example datasets generated by the ``prepare_dummy_dataset.py`` script:

```bash
# Execute from skyrl-train directory
uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv --env_name echo_env
# Or generate datasets for all environments:
# uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv 
```

This creates training and validation datasets with example prompts for the specified environment (we provide two examples in ``echo_env`` and ``coding_env``)

`prepare_dummy_dataset.py` has additional optional parameters:
  - `--output_dir`: directory to place datasets (default: `~/data/openenv`)
  - `--env_name`: specific environment to prepare dataset for (default: all environments)

Notes on dataset generation:
- This script will generate the following Parquet files under `output_dir`:
  - `train.parquet`
  - `validation.parquet`
- For issues in loading the dataset, see the Troubleshooting section below.

### 3) Training

We provide an example training script for Qwen2.5-0.5B-Instruct on OpenEnv environments:

```bash
# Execute from skyrl-train directory
bash integrations/openenv/run_openenv.sh
```

Currently, the supporting environments are: ``echo_env``, ``coding_env``, ``openspiel-env``, ``atari-env``, ``sumo-rl-env``, ``finrl-env``.
You can customize the training by setting environment variables:

```bash
ENV_NAME=coding_env NUM_GPUS=2 bash integrations/openenv/run_openenv.sh
```

Or modify the commonly-edited training settings in `run_openenv.sh` as needed:
```bash
ENV_NAME="coding_env"
DATA_DIR="$HOME/data/openenv/$ENV_NAME"
NUM_GPUS=4
LOGGER="wandb"
```

All training parameters can be modified in `run_openenv.sh`, such as the model choice (`trainer.policy.model.path`), GRPO group size (`generator.n_samples_per_prompt`), or training batch size (`trainer.train_batch_size`).

See all available training configuration parameters in `ppo_base_config.yaml`.



## Tips

- **Docker Resources**: Ensure sufficient Docker resources are available, especially for computationally intensive environments like Atari or OpenSpiel.
- **Generation Format**: The generation format right now is expected to be a single action wrapped in ``<action>...</action>`` tags for dummy testing. Change `_get_openenv_action` in the OpenEnv environment wrapper (`integrations/openenv/env.py`) for custom parsing logic.
- **Environment Variables**: You can override default values with environment variables like `NUM_GPUS=1`, `ENV_NAME=coding_env`, `MAX_TURNS=1` etc.
- **Logging**: Set `LOGGER=console` to print logs to stdout instead of using wandb.

## Troubleshooting

For issues with SkyRL or the integration with OpenEnv, please [open an Issue](https://github.com/NovaSky-AI/SkyRL/issues/new). 

### Datasets

We use dummy datasets for all the environment integration now. Please modify `prepare_dummy_dataset.py` as needed to extract and prepare the correct datasets.

## TODOs and Limitations
We welcome any contributions to help resolve the remaining tasks!
* Make it easier to specify different OpenEnv environments used for training and validation.
* Make it smoother to specify which dataset splits to use
