SkyPilot Examples
=================

Last updated: 09/04/2025.

This guide provides examples of running VERL reinforcement learning training on Kubernetes clusters or cloud platforms with GPU nodes using `SkyPilot <https://github.com/skypilot-org/skypilot>`_.

Installation and Configuration
-------------------------------

Step 1: Install SkyPilot
~~~~~~~~~~~~~~~~~~~~~~~~~

Choose the installation based on your target platform:

.. code-block:: bash

   # For Kubernetes only
   pip install "skypilot[kubernetes]"
   
   # For AWS
   pip install "skypilot[aws]"
   
   # For Google Cloud Platform
   pip install "skypilot[gcp]"
   
   # For Azure
   pip install "skypilot[azure]"
   
   # For multiple platforms
   pip install "skypilot[kubernetes,aws,gcp,azure]"

Step 2: Configure Your Platform
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See https://docs.skypilot.co/en/latest/getting-started/installation.html

Step 3: Set Up Environment Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Export necessary API keys for experiment tracking:

.. code-block:: bash

   # For Weights & Biases tracking
   export WANDB_API_KEY="your-wandb-api-key"
   
   # For HuggingFace gated models (if needed)
   export HF_TOKEN="your-huggingface-token"

Examples
--------

All example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory on GitHub. See the `README <https://github.com/volcengine/verl/blob/main/examples/skypilot/README.md>`_ for additional details.

PPO Training
~~~~~~~~~~~~

.. code-block:: bash

   sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y

Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ``examples/ppo_trainer/``.

`View verl-ppo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-ppo.yaml>`_

GRPO Training
~~~~~~~~~~~~~

.. code-block:: bash

   sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y

Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ``examples/grpo_trainer/``.

`View verl-grpo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-grpo.yaml>`_

Multi-turn Tool Usage Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   sky launch -c verl-multiturn verl-multiturn-tools.yaml \
     --secret WANDB_API_KEY --secret HF_TOKEN -y

Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ``examples/sglang_multiturn/`` but uses vLLM instead of sglang.

`View verl-multiturn-tools.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-multiturn-tools.yaml>`_

Configuration
-------------

The example YAML files are pre-configured with:

- **Infrastructure**: Kubernetes clusters (``infra: k8s``) - can be changed to ``infra: aws`` or ``infra: gcp``, etc.
- **Docker Image**: VERL's official Docker image with CUDA 12.6 support
- **Setup**: Automatically clones and installs VERL from source
- **Datasets**: Downloads required datasets during setup phase
- **Ray Cluster**: Configures distributed training across nodes
- **Logging**: Supports Weights & Biases via ``--secret WANDB_API_KEY``
- **Models**: Supports gated HuggingFace models via ``--secret HF_TOKEN``

Launch Command Options
----------------------

- ``-c <name>``: Cluster name for managing the job
- ``--secret KEY``: Pass secrets for API keys (can be used multiple times)
- ``-y``: Skip confirmation prompt

Monitoring Your Jobs
--------------------

Check Cluster Status
~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   sky status

View Logs
~~~~~~~~~

.. code-block:: bash

   sky logs verl-ppo  # View logs for the PPO job

SSH into Head Node
~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   ssh verl-ppo

Access Ray Dashboard
~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   sky status --endpoint 8265 verl-ppo  # Get dashboard URL

Stop a Cluster
~~~~~~~~~~~~~~

.. code-block:: bash

   sky down verl-ppo
