SkyRL + OpenEnv: Training a RL Agent in OpenEnv
===========================================================

In this example, we walk through a simple example on how to train a reinforcement learning agent using SkyRL with `OpenEnv <https://github.com/meta-pytorch/OpenEnv>`_ environments. OpenEnv provides isolated execution environments for agentic RL training with Gymnasium-style APIs.

How does it work?
------------------

SkyRL integrates with any Gymnasium API-based environment easily with SkyRL-Gym, which provides a simple interface for text-based environments called ``BaseTextEnv``. We integrate OpenEnv environments through a custom environment wrapper ``OpenEnv`` that implements the ``BaseTextEnv`` interface. This wrapper allows SkyRL to interact with various OpenEnv environments, including:

- **Echo Environment**: Simple echo environment for testing
- **Coding Environment**: Python code execution in sandboxed environment  
- **OpenSpiel Environment**: Game environments using OpenSpiel
- **Atari Environment**: Classic Atari game environments
- **SUMO-RL Environment**: Traffic simulation environments
- **FinRL Environment**: Financial trading environments

The integration works by:

1. **Environment Registration**: The OpenEnv environment is registered dynamically in the entrypoint using ``register()`` from ``skyrl_gym.envs``
2. **Environment Initialization**: SkyRL creates an OpenEnv client using ``from_docker_image()`` to connect to the appropriate Docker container
3. **Action Parsing**: LLM responses are parsed into environment-specific actions (e.g., ``EchoAction``, ``CodeAction``)
4. **Step Execution**: Actions are executed in the isolated environment and observations/rewards are returned
5. **Episode Management**: The environment tracks conversation history and manages episode termination

At a high level, the integration looks as follows:

.. code-block:: python

    # The OpenEnv wrapper class
    class OpenEnv(BaseTextEnv):
        def __init__(self, env_config: DictConfig, extras: Dict[str, Any] = {}):
            self.env_name = extras["env_name"]
            self.env_type = self._get_env_class(self.env_name)
            self.env = self.env_type.from_docker_image(self.env_name + ":latest")
            self.initial_step_result = self.env.reset()

        def step(self, action: str) -> BaseTextEnvStepOutput:
            action = self._get_openenv_action(self.env_name, action)
            result = self.env.step(action)
            # Process result and return observations, reward, done


Finally, we also register the new environment in the entrypoint script:

.. code-block:: python
        
    # In integrations/openenv/entrypoints/main_openenv.py
    from skyrl_gym.envs import register
    
    register(
        id="openenv",
        entry_point="integrations.openenv.env:OpenEnv",
    )


Environment Setup
-----------------

Prerequisites: Ensure that you have Docker installed

First, we need to install the OpenEnv environments:

.. code-block:: bash

    # Execute from skyrl-train directory
    cd SkyRL/skyrl-train
    uv run integrations/openenv/install_environment.py echo-env
    # Or install all environments:
    # uv run integrations/openenv/install_environment.py

This will pull the necessary Docker images for the OpenEnv environments.

Dataset Preparation
-------------------

For training, we use simple example datasets generated by the ``prepare_dummy_dataset.py`` script:

.. code-block:: bash
    
    # Execute from skyrl-train directory
    cd SkyRL/skyrl-train
    uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv --env_name echo_env

This creates training and validation datasets with example prompts for the specified environment. We provide dummy train set examples for ``echo_env`` and ``coding_env``.

Training
--------

We provide an example training script for Qwen2.5-1.5B-Instruct on OpenEnv environments:

.. code-block:: bash

    # Execute from skyrl-train directory
    cd SkyRL/skyrl-train
    bash integrations/openenv/run_openenv.sh

You can customize the training by setting environment variables:

.. code-block:: bash

    ENV_NAME=echo_env NUM_GPUS=2 bash integrations/openenv/run_openenv.sh


Supporting environments are: ``echo_env``, ``coding_env``, ``openspiel-env``, ``atari-env``, ``sumo-rl-env``, ``finrl-env``.

Example Reward Curve 
--------

Here's how the reward curve for the above example script looks like after a few steps: 

..  image:: images/openenv-reward.png
    :scale: 60%
    :align: center
Tips
~~~~~

- **Docker Resources**: Ensure sufficient Docker resources are available, especially for computationally intensive environments like Atari or OpenSpiel.
- **Generation Format**: The generation format is expected to be a single action wrapped in ``<action>...</action>`` tags for dummy testing. Change `_get_openenv_action` in :code_link:`integrations/openenv/env.py` for custom parsing logic.
- **Multi-Turn Interaction**: Pass ``MAX_TURNS=xx`` to enable multi-turn interaction.