{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# OmniSafe Tutorial - Environment Customization From Scratch\n",
    "\n",
    "OmniSafe: https://github.com/PKU-Alignment/omnisafe\n",
    "\n",
    "Documentation: https://omnisafe.readthedocs.io/en/latest/\n",
    "\n",
    "Safety-Gymnasium: https://www.safety-gymnasium.com/\n",
    "\n",
    "[Safety-Gymnasium](https://www.safety-gymnasium.com/) is a highly scalable and customizable Safe Reinforcement Learning library, aiming to deliver a good view of benchmarking Safe Reinforcement Learning (Safe RL) algorithms and a more standardized setting of environments. \n",
    "\n",
    "## Introduction\n",
    "\n",
    "This section, along with [Tutorial 4: Environment Customization from Community](./4.Environment%20Customization%20from%20Community.ipynb), introduces how to enjoy the full set of training, recording, and saving frameworks provided by OmniSafe for customized environments. This section focuses on introducing beginners to SafeRL on how to create an environment from scratch, while [Tutorial 4: Environment Customization from Community](./4.Gymnasium%20Customization.ipynb) focuses on how to make minimal adaptations to existing community environments, such as [Gymnasium](https://github.com/Farama-Foundation/Gymnasium), to embed them in OmniSafe.\n",
    "\n",
    "Specifically, this section provides a simplest template for customizing environments. Through this template, you will understand:\n",
    "\n",
    "- How to create and register an environment in OmniSafe.\n",
    "- How to specify customization parameters when creating an environment.\n",
    "- How to record environment-specific information.\n",
    "\n",
    "## Quick Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install via pip (ignore it if you have already installed).\n",
    "!pip install omnisafe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install from source (ignore it if you have already installed).\n",
    "## clone the repo\n",
    "!git clone https://github.com/PKU-Alignment/omnisafe\n",
    "%cd omnisafe\n",
    "\n",
    "## install it\n",
    "!pip install -e ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Simplest Custom Environment Template\n",
    "The customized environment of OmniSafe can be implemented through a single file. We will introduce you to the simplest custom environment template, which will serve as a quick start.\n",
    "\n",
    "### Custom Environment Design\n",
    "Here, we will detail the design process of a simple random environment. If you are an expert in RL or an experienced researcher, you can skip this module to [Custom Environment Embedding](#custom-environment-embedding) or [Tutorial 4: Environment Customization from Community](./4.Gymnasium%20Customization.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import all we need\n",
    "from __future__ import annotations\n",
    "\n",
    "import random\n",
    "import omnisafe\n",
    "from typing import Any, ClassVar\n",
    "\n",
    "import torch\n",
    "from gymnasium import spaces\n",
    "\n",
    "from omnisafe.envs.core import CMDP, env_register, env_unregister"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define environment class\n",
    "class ExampleEnv(CMDP):\n",
    "    _support_envs: ClassVar[list[str]] = ['Example-v0']  # Supported task names\n",
    "\n",
    "    need_auto_reset_wrapper = True  # Whether `AutoReset` Wrapper is needed\n",
    "    need_time_limit_wrapper = True  # Whether `TimeLimit` Wrapper is needed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You need to pay attention to the following details in the above code:\n",
    "\n",
    "- **Task name definition** The supported task names for the environment are provided in `_support_envs`.\n",
    "- **Wrapper configuration** Automatic reset and time limit are defined by setting `need_auto_reset_wrapper` and `need_time_limit_wrapper`.\n",
    "- **Number of parallel environments** If your environment supports vectorized parallelism, set it through the `_num_envs` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ExampleEnv(CMDP):\n",
    "    _support_envs: ClassVar[list[str]] = ['Example-v0', 'Example-v1']  # Supported task names\n",
    "\n",
    "    need_auto_reset_wrapper = True  # Whether `AutoReset` Wrapper is needed\n",
    "    need_time_limit_wrapper = True  # Whether `TimeLimit` Wrapper is needed\n",
    "\n",
    "    def __init__(self, env_id: str, **kwargs) -> None:\n",
    "        self._count = 0\n",
    "        self._num_envs = 1\n",
    "        self._observation_space = spaces.Box(low=-1.0, high=1.0, shape=(3,))\n",
    "        self._action_space = spaces.Box(low=-1.0, high=1.0, shape=(2,))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Complete the `__init__` method definition. Here, you need to specify the action space and observation space of the environment. You need to define these according to the specific task you are currently designing. For example:\n",
    "```python\n",
    "if env_id == 'Example-v0':\n",
    "    self._observation_space = spaces.Box(low=-1.0, high=1.0, shape=(3,))\n",
    "    self._action_space = spaces.Box(low=-1.0, high=1.0, shape=(2,))\n",
    "elif env_id == 'Example-v1':\n",
    "    self._observation_space = spaces.Box(low=-1.0, high=1.0, shape=(4,))\n",
    "    self._action_space = spaces.Box(low=-1.0, high=1.0, shape=(3,))\n",
    "else:\n",
    "    raise NotImplementedError\n",
    "```\n",
    "**Note:** As it is necessary to provide a standard interface for the higher-level modules, please follow these two variable names, i.e., `self._observation_space` and `self._action_space`, when designing the environment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Complete the definition of methods related to environment initialization. `reset` and `set_seed` are the standard interfaces for OmniSafe environment initialization. Where `reset` resets the environment state and the step counter. Meanwhile, `set_seed` ensures the reproducibility of experiments by setting the random seed. The `max_episode_steps` method, decorated with `@property`, is used to pass the maximum number of steps per episode that need to be limited to the `TimeLimit` Wrapper. The code is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ExampleEnv(CMDP):\n",
    "    _support_envs: ClassVar[list[str]] = ['Example-v0', 'Example-v1']  # Supported task names\n",
    "\n",
    "    need_auto_reset_wrapper = True  # Whether `AutoReset` Wrapper is needed\n",
    "    need_time_limit_wrapper = True  # Whether `TimeLimit` Wrapper is needed\n",
    "\n",
    "    def __init__(self, env_id: str, **kwargs) -> None:\n",
    "        self._count = 0\n",
    "        self._num_envs = 1\n",
    "        self._observation_space = spaces.Box(low=-1.0, high=1.0, shape=(3,))\n",
    "        self._action_space = spaces.Box(low=-1.0, high=1.0, shape=(2,))\n",
    "\n",
    "    def set_seed(self, seed: int) -> None:\n",
    "        random.seed(seed)\n",
    "\n",
    "    def reset(\n",
    "        self,\n",
    "        seed: int | None = None,\n",
    "        options: dict[str, Any] | None = None,\n",
    "    ) -> tuple[torch.Tensor, dict]:\n",
    "        if seed is not None:\n",
    "            self.set_seed(seed)\n",
    "        obs = torch.as_tensor(self._observation_space.sample())\n",
    "        self._count = 0\n",
    "        return obs, {}\n",
    "\n",
    "    @property\n",
    "    def max_episode_steps(self) -> None:\n",
    "        \"\"\"The max steps per episode.\"\"\"\n",
    "        return 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Complete the definition of functional methods. The `render` method is used for rendering the environment; the `close` method is used for cleanup after training ends."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ExampleEnv(CMDP):\n",
    "    _support_envs: ClassVar[list[str]] = ['Example-v0', 'Example-v1']  # Supported task names\n",
    "\n",
    "    need_auto_reset_wrapper = True  # Whether `AutoReset` Wrapper is needed\n",
    "    need_time_limit_wrapper = True  # Whether `TimeLimit` Wrapper is needed\n",
    "\n",
    "    def __init__(self, env_id: str, **kwargs) -> None:\n",
    "        self._count = 0\n",
    "        self._num_envs = 1\n",
    "        self._observation_space = spaces.Box(low=-1.0, high=1.0, shape=(3,))\n",
    "        self._action_space = spaces.Box(low=-1.0, high=1.0, shape=(2,))\n",
    "\n",
    "    def set_seed(self, seed: int) -> None:\n",
    "        random.seed(seed)\n",
    "\n",
    "    def reset(\n",
    "        self,\n",
    "        seed: int | None = None,\n",
    "        options: dict[str, Any] | None = None,\n",
    "    ) -> tuple[torch.Tensor, dict]:\n",
    "        if seed is not None:\n",
    "            self.set_seed(seed)\n",
    "        obs = torch.as_tensor(self._observation_space.sample())\n",
    "        self._count = 0\n",
    "        return obs, {}\n",
    "\n",
    "    @property\n",
    "    def max_episode_steps(self) -> None:\n",
    "        \"\"\"The max steps per episode.\"\"\"\n",
    "        return 10\n",
    "\n",
    "    def render(self) -> Any:\n",
    "        pass\n",
    "\n",
    "    def close(self) -> None:\n",
    "        pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Complete the definition of the step method. Here is the core interaction logic of your customized environment. You only need to adjust according to the data input and output format in this example. You can also directly change the random interaction dynamics in this example to the dynamics of your environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ExampleEnv(CMDP):\n",
    "    _support_envs: ClassVar[list[str]] = ['Example-v0', 'Example-v1']  # Supported task names\n",
    "\n",
    "    need_auto_reset_wrapper = True  # Whether `AutoReset` Wrapper is needed\n",
    "    need_time_limit_wrapper = True  # Whether `TimeLimit` Wrapper is needed\n",
    "\n",
    "    def __init__(self, env_id: str, **kwargs) -> None:\n",
    "        self._count = 0\n",
    "        self._num_envs = 1\n",
    "        self._observation_space = spaces.Box(low=-1.0, high=1.0, shape=(3,))\n",
    "        self._action_space = spaces.Box(low=-1.0, high=1.0, shape=(2,))\n",
    "\n",
    "    def set_seed(self, seed: int) -> None:\n",
    "        random.seed(seed)\n",
    "\n",
    "    def reset(\n",
    "        self,\n",
    "        seed: int | None = None,\n",
    "        options: dict[str, Any] | None = None,\n",
    "    ) -> tuple[torch.Tensor, dict]:\n",
    "        if seed is not None:\n",
    "            self.set_seed(seed)\n",
    "        obs = torch.as_tensor(self._observation_space.sample())\n",
    "        self._count = 0\n",
    "        return obs, {}\n",
    "\n",
    "    @property\n",
    "    def max_episode_steps(self) -> None:\n",
    "        \"\"\"The max steps per episode.\"\"\"\n",
    "        return 10\n",
    "\n",
    "    def render(self) -> Any:\n",
    "        pass\n",
    "\n",
    "    def close(self) -> None:\n",
    "        pass\n",
    "\n",
    "    def step(\n",
    "        self,\n",
    "        action: torch.Tensor,\n",
    "    ) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, dict]:\n",
    "        self._count += 1\n",
    "        obs = torch.as_tensor(self._observation_space.sample())\n",
    "        reward = 2 * torch.as_tensor(random.random())\n",
    "        cost = 2 * torch.as_tensor(random.random())\n",
    "        terminated = torch.as_tensor(random.random() > 0.9)\n",
    "        truncated = torch.as_tensor(self._count > 10)\n",
    "        return obs, reward, cost, terminated, truncated, {'final_observation': obs}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's try to run the environment for 10 time steps and observe the interaction information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------\n",
      "obs: tensor([0.5903, 0.9825, 0.6966])\n",
      "reward: 1.6888437271118164\n",
      "cost: 1.5159088373184204\n",
      "terminated: False\n",
      "truncated: False\n",
      "********************\n",
      "--------------------\n",
      "obs: tensor([-0.0615,  0.8932, -0.1051])\n",
      "reward: 0.5178334712982178\n",
      "cost: 1.0225493907928467\n",
      "terminated: False\n",
      "truncated: False\n",
      "********************\n",
      "--------------------\n",
      "obs: tensor([ 0.7570, -0.0613,  0.9682])\n",
      "reward: 1.5675971508026123\n",
      "cost: 0.6066254377365112\n",
      "terminated: False\n",
      "truncated: False\n",
      "********************\n",
      "--------------------\n",
      "obs: tensor([ 0.1937,  0.5437, -0.4663])\n",
      "reward: 1.1667640209197998\n",
      "cost: 1.8162257671356201\n",
      "terminated: False\n",
      "truncated: False\n",
      "********************\n",
      "--------------------\n",
      "obs: tensor([-0.9458, -0.1812,  0.4118])\n",
      "reward: 0.5636757016181946\n",
      "cost: 1.511608362197876\n",
      "terminated: False\n",
      "truncated: False\n",
      "********************\n",
      "--------------------\n",
      "obs: tensor([-0.9290, -0.0350,  0.3893])\n",
      "reward: 0.5010126829147339\n",
      "cost: 1.8194924592971802\n",
      "terminated: True\n",
      "truncated: False\n",
      "********************\n"
     ]
    }
   ],
   "source": [
    "env = ExampleEnv(env_id='Example-v0')\n",
    "env.reset(seed=0)\n",
    "while True:\n",
    "    action = env.action_space.sample()\n",
    "    obs, reward, cost, terminated, truncated, info = env.step(action)\n",
    "    print('-' * 20)\n",
    "    print(f'obs: {obs}')\n",
    "    print(f'reward: {reward}')\n",
    "    print(f'cost: {cost}')\n",
    "    print(f'terminated: {terminated}')\n",
    "    print(f'truncated: {truncated}')\n",
    "    print('*' * 20)\n",
    "    if terminated or truncated:\n",
    "        break\n",
    "env.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Congratulations! You have successfully completed the basic environment definition. Next, we will introduce how to register this environment into OmniSafe, and implement steps such as environment parameter passing, interaction information recording, algorithm training, and result saving."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom Environment Embedding\n",
    "\n",
    "### Quick Training\n",
    "\n",
    "Thanks to the carefully designed registration mechanism of OmniSafe, we only need one decorator to register this environment into the OmniSafe's environment list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "@env_register\n",
    "class ExampleEnv(ExampleEnv):\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Registering an environment with the same name will cause an error, due to **environment name conflict**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@env_register\n",
    "class CustomExampleEnv(ExampleEnv):\n",
    "    example_configs = 1\n",
    "\n",
    "\n",
    "env = CustomExampleEnv('Example-v0')\n",
    "env.example_configs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, you need to manually unregister the environment first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "@env_unregister\n",
    "class CustomExampleEnv(ExampleEnv):\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Afterwards, you can re-register the environment. In this tutorial, we will nest both the `env_register` and `env_unregister` decorators together. This is to avoid errors caused by repeated registration of the environment, ensuring that the environment is registered only once, so users can modify and run the code multiple times while reading this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CustomExampleEnv has not been registered yet\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@env_register\n",
    "@env_unregister\n",
    "class CustomExampleEnv(ExampleEnv):\n",
    "    example_configs = 2\n",
    "\n",
    "\n",
    "env = CustomExampleEnv('Example-v0')\n",
    "env.example_configs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Subsequently, you can use the algorithms in OmniSafe to train this custom environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading PPOLag.yaml from /home/safepo/dev-env/omnisafe_zjy/omnisafe/utils/../configs/on-policy/PPOLag.yaml\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Logging data to .</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">/runs/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">PPOLag-</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">{Example-v0}</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">/seed-000-2024-04-09-15-08-37/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">progress.csv</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mLogging data to .\u001b[0m\u001b[1;35m/runs/\u001b[0m\u001b[1;95mPPOLag-\u001b[0m\u001b[1;36m{\u001b[0m\u001b[1;36mExample-v0\u001b[0m\u001b[1;36m}\u001b[0m\u001b[1;35m/seed-000-2024-04-09-15-08-37/\u001b[0m\u001b[1;95mprogress.csv\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">Save with config in config.json</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;33mSave with config in config.json\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">INFO: Start training</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mINFO: Start training\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">/home/safepo/anaconda3/envs/dev-env/lib/python3.8/site-packages/rich/live.py:231: UserWarning: install \"ipywidgets\"\n",
       "for Jupyter support\n",
       "  warnings.warn('install \"ipywidgets\" for Jupyter support')\n",
       "</pre>\n"
      ],
      "text/plain": [
       "/home/safepo/anaconda3/envs/dev-env/lib/python3.8/site-packages/rich/live.py:231: UserWarning: install \"ipywidgets\"\n",
       "for Jupyter support\n",
       "  warnings.warn('install \"ipywidgets\" for Jupyter support')\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> Metrics                        </span>┃<span style=\"font-weight: bold\"> Value                   </span>┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 5.625942230224609       │\n",
       "│ Metrics/EpCost                 │ 6.960921287536621       │\n",
       "│ Metrics/EpLen                  │ 5.0                     │\n",
       "│ Train/Epoch                    │ 0.0                     │\n",
       "│ Train/Entropy                  │ 1.4189385175704956      │\n",
       "│ Train/KL                       │ 0.0002234023268101737   │\n",
       "│ Train/StopIter                 │ 1.0                     │\n",
       "│ Train/PolicyRatio/Mean         │ 1.0                     │\n",
       "│ Train/PolicyRatio/Min          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Max          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                     │\n",
       "│ Train/LR                       │ 0.00019999999494757503  │\n",
       "│ Train/PolicyStd                │ 1.0                     │\n",
       "│ TotalEnvSteps                  │ 10.0                    │\n",
       "│ Loss/Loss_pi                   │ 7.748603536583687e-08   │\n",
       "│ Loss/Loss_pi/Delta             │ 7.748603536583687e-08   │\n",
       "│ Value/Adv                      │ -1.7881394143159923e-08 │\n",
       "│ Loss/Loss_reward_critic        │ 10.457597732543945      │\n",
       "│ Loss/Loss_reward_critic/Delta  │ 10.457597732543945      │\n",
       "│ Value/reward                   │ -0.012156231328845024   │\n",
       "│ Loss/Loss_cost_critic          │ 18.316673278808594      │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 18.316673278808594      │\n",
       "│ Value/cost                     │ 0.1599183827638626      │\n",
       "│ Time/Total                     │ 0.03895211219787598     │\n",
       "│ Time/Rollout                   │ 0.021677017211914062    │\n",
       "│ Time/Update                    │ 0.01619410514831543     │\n",
       "│ Time/Epoch                     │ 0.0379033088684082      │\n",
       "│ Time/FPS                       │ 263.8358459472656       │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                     │\n",
       "└────────────────────────────────┴─────────────────────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1mMetrics                       \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mValue                  \u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 5.625942230224609       │\n",
       "│ Metrics/EpCost                 │ 6.960921287536621       │\n",
       "│ Metrics/EpLen                  │ 5.0                     │\n",
       "│ Train/Epoch                    │ 0.0                     │\n",
       "│ Train/Entropy                  │ 1.4189385175704956      │\n",
       "│ Train/KL                       │ 0.0002234023268101737   │\n",
       "│ Train/StopIter                 │ 1.0                     │\n",
       "│ Train/PolicyRatio/Mean         │ 1.0                     │\n",
       "│ Train/PolicyRatio/Min          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Max          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                     │\n",
       "│ Train/LR                       │ 0.00019999999494757503  │\n",
       "│ Train/PolicyStd                │ 1.0                     │\n",
       "│ TotalEnvSteps                  │ 10.0                    │\n",
       "│ Loss/Loss_pi                   │ 7.748603536583687e-08   │\n",
       "│ Loss/Loss_pi/Delta             │ 7.748603536583687e-08   │\n",
       "│ Value/Adv                      │ -1.7881394143159923e-08 │\n",
       "│ Loss/Loss_reward_critic        │ 10.457597732543945      │\n",
       "│ Loss/Loss_reward_critic/Delta  │ 10.457597732543945      │\n",
       "│ Value/reward                   │ -0.012156231328845024   │\n",
       "│ Loss/Loss_cost_critic          │ 18.316673278808594      │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 18.316673278808594      │\n",
       "│ Value/cost                     │ 0.1599183827638626      │\n",
       "│ Time/Total                     │ 0.03895211219787598     │\n",
       "│ Time/Rollout                   │ 0.021677017211914062    │\n",
       "│ Time/Update                    │ 0.01619410514831543     │\n",
       "│ Time/Epoch                     │ 0.0379033088684082      │\n",
       "│ Time/FPS                       │ 263.8358459472656       │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                     │\n",
       "└────────────────────────────────┴─────────────────────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Warning: trajectory cut off when rollout by epoch at </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">10.0</span><span style=\"color: #008000; text-decoration-color: #008000\"> steps.</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mWarning: trajectory cut off when rollout by epoch at \u001b[0m\u001b[1;36m10.0\u001b[0m\u001b[32m steps.\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> Metrics                        </span>┃<span style=\"font-weight: bold\"> Value                  </span>┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 7.8531365394592285     │\n",
       "│ Metrics/EpCost                 │ 7.931504726409912      │\n",
       "│ Metrics/EpLen                  │ 6.666666507720947      │\n",
       "│ Train/Epoch                    │ 1.0                    │\n",
       "│ Train/Entropy                  │ 1.4192386865615845     │\n",
       "│ Train/KL                       │ 8.405959670199081e-05  │\n",
       "│ Train/StopIter                 │ 1.0                    │\n",
       "│ Train/PolicyRatio/Mean         │ 1.0                    │\n",
       "│ Train/PolicyRatio/Min          │ 1.0                    │\n",
       "│ Train/PolicyRatio/Max          │ 1.0                    │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                    │\n",
       "│ Train/LR                       │ 9.999999747378752e-05  │\n",
       "│ Train/PolicyStd                │ 1.0003000497817993     │\n",
       "│ TotalEnvSteps                  │ 20.0                   │\n",
       "│ Loss/Loss_pi                   │ -8.940696716308594e-08 │\n",
       "│ Loss/Loss_pi/Delta             │ -1.668930025289228e-07 │\n",
       "│ Value/Adv                      │ 8.940696716308594e-08  │\n",
       "│ Loss/Loss_reward_critic        │ 37.962928771972656     │\n",
       "│ Loss/Loss_reward_critic/Delta  │ 27.50533103942871      │\n",
       "│ Value/reward                   │ -0.00784378219395876   │\n",
       "│ Loss/Loss_cost_critic          │ 25.662063598632812     │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 7.345390319824219      │\n",
       "│ Value/cost                     │ 0.11082335561513901    │\n",
       "│ Time/Total                     │ 0.08216094970703125    │\n",
       "│ Time/Rollout                   │ 0.01664590835571289    │\n",
       "│ Time/Update                    │ 0.013554811477661133   │\n",
       "│ Time/Epoch                     │ 0.03022909164428711    │\n",
       "│ Time/FPS                       │ 330.8123779296875      │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                    │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                    │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                    │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                    │\n",
       "└────────────────────────────────┴────────────────────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1mMetrics                       \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mValue                 \u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 7.8531365394592285     │\n",
       "│ Metrics/EpCost                 │ 7.931504726409912      │\n",
       "│ Metrics/EpLen                  │ 6.666666507720947      │\n",
       "│ Train/Epoch                    │ 1.0                    │\n",
       "│ Train/Entropy                  │ 1.4192386865615845     │\n",
       "│ Train/KL                       │ 8.405959670199081e-05  │\n",
       "│ Train/StopIter                 │ 1.0                    │\n",
       "│ Train/PolicyRatio/Mean         │ 1.0                    │\n",
       "│ Train/PolicyRatio/Min          │ 1.0                    │\n",
       "│ Train/PolicyRatio/Max          │ 1.0                    │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                    │\n",
       "│ Train/LR                       │ 9.999999747378752e-05  │\n",
       "│ Train/PolicyStd                │ 1.0003000497817993     │\n",
       "│ TotalEnvSteps                  │ 20.0                   │\n",
       "│ Loss/Loss_pi                   │ -8.940696716308594e-08 │\n",
       "│ Loss/Loss_pi/Delta             │ -1.668930025289228e-07 │\n",
       "│ Value/Adv                      │ 8.940696716308594e-08  │\n",
       "│ Loss/Loss_reward_critic        │ 37.962928771972656     │\n",
       "│ Loss/Loss_reward_critic/Delta  │ 27.50533103942871      │\n",
       "│ Value/reward                   │ -0.00784378219395876   │\n",
       "│ Loss/Loss_cost_critic          │ 25.662063598632812     │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 7.345390319824219      │\n",
       "│ Value/cost                     │ 0.11082335561513901    │\n",
       "│ Time/Total                     │ 0.08216094970703125    │\n",
       "│ Time/Rollout                   │ 0.01664590835571289    │\n",
       "│ Time/Update                    │ 0.013554811477661133   │\n",
       "│ Time/Epoch                     │ 0.03022909164428711    │\n",
       "│ Time/FPS                       │ 330.8123779296875      │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                    │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                    │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                    │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                    │\n",
       "└────────────────────────────────┴────────────────────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">Warning: trajectory cut off when rollout by epoch at </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">9.0</span><span style=\"color: #008000; text-decoration-color: #008000\"> steps.</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mWarning: trajectory cut off when rollout by epoch at \u001b[0m\u001b[1;36m9.0\u001b[0m\u001b[32m steps.\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> Metrics                        </span>┃<span style=\"font-weight: bold\"> Value                   </span>┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 6.297085762023926       │\n",
       "│ Metrics/EpCost                 │ 6.2187700271606445      │\n",
       "│ Metrics/EpLen                  │ 5.25                    │\n",
       "│ Train/Epoch                    │ 2.0                     │\n",
       "│ Train/Entropy                  │ 1.419387936592102       │\n",
       "│ Train/KL                       │ 6.185231995914364e-06   │\n",
       "│ Train/StopIter                 │ 1.0                     │\n",
       "│ Train/PolicyRatio/Mean         │ 0.9999998211860657      │\n",
       "│ Train/PolicyRatio/Min          │ 0.9999998211860657      │\n",
       "│ Train/PolicyRatio/Max          │ 0.9999998211860657      │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                     │\n",
       "│ Train/LR                       │ 0.0                     │\n",
       "│ Train/PolicyStd                │ 1.0004496574401855      │\n",
       "│ TotalEnvSteps                  │ 30.0                    │\n",
       "│ Loss/Loss_pi                   │ 7.152557657263969e-08   │\n",
       "│ Loss/Loss_pi/Delta             │ 1.6093254373572563e-07  │\n",
       "│ Value/Adv                      │ -1.4305115314527939e-07 │\n",
       "│ Loss/Loss_reward_critic        │ 34.879573822021484      │\n",
       "│ Loss/Loss_reward_critic/Delta  │ -3.083354949951172      │\n",
       "│ Value/reward                   │ 0.020589731633663177    │\n",
       "│ Loss/Loss_cost_critic          │ 27.62775230407715       │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 1.965688705444336       │\n",
       "│ Value/cost                     │ 0.13300421833992004     │\n",
       "│ Time/Total                     │ 0.12445831298828125     │\n",
       "│ Time/Rollout                   │ 0.0154266357421875      │\n",
       "│ Time/Update                    │ 0.009746313095092773    │\n",
       "│ Time/Epoch                     │ 0.02520155906677246     │\n",
       "│ Time/FPS                       │ 396.81585693359375      │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                     │\n",
       "└────────────────────────────────┴─────────────────────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1mMetrics                       \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mValue                  \u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 6.297085762023926       │\n",
       "│ Metrics/EpCost                 │ 6.2187700271606445      │\n",
       "│ Metrics/EpLen                  │ 5.25                    │\n",
       "│ Train/Epoch                    │ 2.0                     │\n",
       "│ Train/Entropy                  │ 1.419387936592102       │\n",
       "│ Train/KL                       │ 6.185231995914364e-06   │\n",
       "│ Train/StopIter                 │ 1.0                     │\n",
       "│ Train/PolicyRatio/Mean         │ 0.9999998211860657      │\n",
       "│ Train/PolicyRatio/Min          │ 0.9999998211860657      │\n",
       "│ Train/PolicyRatio/Max          │ 0.9999998211860657      │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                     │\n",
       "│ Train/LR                       │ 0.0                     │\n",
       "│ Train/PolicyStd                │ 1.0004496574401855      │\n",
       "│ TotalEnvSteps                  │ 30.0                    │\n",
       "│ Loss/Loss_pi                   │ 7.152557657263969e-08   │\n",
       "│ Loss/Loss_pi/Delta             │ 1.6093254373572563e-07  │\n",
       "│ Value/Adv                      │ -1.4305115314527939e-07 │\n",
       "│ Loss/Loss_reward_critic        │ 34.879573822021484      │\n",
       "│ Loss/Loss_reward_critic/Delta  │ -3.083354949951172      │\n",
       "│ Value/reward                   │ 0.020589731633663177    │\n",
       "│ Loss/Loss_cost_critic          │ 27.62775230407715       │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 1.965688705444336       │\n",
       "│ Value/cost                     │ 0.13300421833992004     │\n",
       "│ Time/Total                     │ 0.12445831298828125     │\n",
       "│ Time/Rollout                   │ 0.0154266357421875      │\n",
       "│ Time/Update                    │ 0.009746313095092773    │\n",
       "│ Time/Epoch                     │ 0.02520155906677246     │\n",
       "│ Time/FPS                       │ 396.81585693359375      │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                     │\n",
       "└────────────────────────────────┴─────────────────────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "(6.297085762023926, 6.2187700271606445, 5.25)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "custom_cfgs = {\n",
    "    'train_cfgs': {\n",
    "        'total_steps': 30,\n",
    "    },\n",
    "    'algo_cfgs': {\n",
    "        'steps_per_epoch': 10,\n",
    "        'update_iters': 1,\n",
    "    },\n",
    "}\n",
    "agent = omnisafe.Agent('PPOLag', 'Example-v0', custom_cfgs=custom_cfgs)\n",
    "agent.learn()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well done! We have completed the embedding and training of this customized environment. Next, we will further explore how to specify hyperparameters for the environment.\n",
    "\n",
    "### Parameter Setting\n",
    "\n",
    "Starting with a new example environment, assume this environment requires a parameter named `num_agents`. We will show how to complete the parameter setting without modifying OmniSafe's code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NewExampleEnv has not been registered yet\n"
     ]
    }
   ],
   "source": [
    "@env_register\n",
    "@env_unregister\n",
    "class NewExampleEnv(ExampleEnv):  # make a new environment\n",
    "    _support_envs: ClassVar[list[str]] = ['NewExample-v0', 'NewExample-v1']\n",
    "    num_agents: ClassVar[int] = 1\n",
    "\n",
    "    def __init__(self, env_id: str, **kwargs) -> None:\n",
    "        super(NewExampleEnv, self).__init__(env_id, **kwargs)\n",
    "        self.num_agents = kwargs.get('num_agents', 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, the `num_agents` parameter is set to a default value: `1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_env = NewExampleEnv('NewExample-v0')\n",
    "new_env.num_agents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we will show how to modify this parameter through OmniSafe's interface and train:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading PPOLag.yaml from /home/safepo/dev-env/omnisafe_zjy/omnisafe/utils/../configs/on-policy/PPOLag.yaml\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Logging data to .</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">/runs/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">PPOLag-</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">{NewExample-v0}</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">/seed-000-2024-04-09-15-08-46/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">progress.csv</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mLogging data to .\u001b[0m\u001b[1;35m/runs/\u001b[0m\u001b[1;95mPPOLag-\u001b[0m\u001b[1;36m{\u001b[0m\u001b[1;36mNewExample-v0\u001b[0m\u001b[1;36m}\u001b[0m\u001b[1;35m/seed-000-2024-04-09-15-08-46/\u001b[0m\u001b[1;95mprogress.csv\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">Save with config in config.json</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;33mSave with config in config.json\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "custom_cfgs.update({'env_cfgs': {'num_agents': 2}})\n",
    "agent = omnisafe.Agent('PPOLag', 'NewExample-v0', custom_cfgs=custom_cfgs)\n",
    "agent.agent._env._env.num_agents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Excellent! We have set `num_agents` to 2. This means we have successfully implemented hyperparameter setting without modifying the code.\n",
    "\n",
    "### Training Information Recording\n",
    "\n",
    "While running the training code, you may have noticed that OmniSafe records training information through `Logger`, for example:\n",
    "\n",
    "```bash\n",
    "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
    "┃ Metrics                        ┃ Value                   ┃\n",
    "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
    "│ Metrics/EpRet                  │ 2.046875                │\n",
    "│ Metrics/EpCost                 │ 2.89453125              │\n",
    "│ Metrics/EpLen                  │ 3.25                    │\n",
    "│ Train/Epoch                    │ 3.0                     │\n",
    "...\n",
    "```\n",
    "So, can we output information from the environment into the log? The answer is yes, and this process also does not require modifying OmniSafe's code. You only need to implement two standard interfaces:\n",
    "1. In the `__init__` function, add the information you want to output to `self.env_spec_log`.\n",
    "2. Instantiate the `spec_log` function to record the required information.\n",
    "\n",
    "**Please note:** Currently, OmniSafe only supports recording this information at the end of each epoch, not after each step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "@env_register\n",
    "@env_unregister\n",
    "class NewExampleEnv(ExampleEnv):\n",
    "    _support_envs: ClassVar[list[str]] = ['NewExample-v0', 'NewExample-v1']\n",
    "\n",
    "    # define what to log\n",
    "    def __init__(self, env_id: str, **kwargs) -> None:\n",
    "        super(NewExampleEnv, self).__init__(env_id, **kwargs)\n",
    "        self.env_spec_log = {'Env/Success_counts': 0}\n",
    "\n",
    "    # interact with the environment and log\n",
    "    def step(self, action):\n",
    "        obs, reward, cost, terminated, truncated, info = super().step(action)\n",
    "        success = int(reward > cost)\n",
    "        self.env_spec_log['Env/Success_counts'] += success\n",
    "        return obs, reward, cost, terminated, truncated, info\n",
    "\n",
    "    # write to logger\n",
    "    def spec_log(self, logger) -> dict[str, Any]:\n",
    "        logger.store({'Env/Success_counts': self.env_spec_log['Env/Success_counts']})\n",
    "        self.env_spec_log['Env/Success_counts'] = 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will briefly train and observe whether this information has been successfully recorded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading PPOLag.yaml from /home/safepo/dev-env/omnisafe_zjy/omnisafe/utils/../configs/on-policy/PPOLag.yaml\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Logging data to .</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">/runs/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">PPOLag-</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">{NewExample-v0}</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">/seed-000-2024-04-09-15-08-52/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff; font-weight: bold\">progress.csv</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mLogging data to .\u001b[0m\u001b[1;35m/runs/\u001b[0m\u001b[1;95mPPOLag-\u001b[0m\u001b[1;36m{\u001b[0m\u001b[1;36mNewExample-v0\u001b[0m\u001b[1;36m}\u001b[0m\u001b[1;35m/seed-000-2024-04-09-15-08-52/\u001b[0m\u001b[1;95mprogress.csv\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">Save with config in config.json</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;33mSave with config in config.json\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000\">INFO: Start training</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[32mINFO: Start training\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> Metrics                        </span>┃<span style=\"font-weight: bold\"> Value                   </span>┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 5.625942230224609       │\n",
       "│ Metrics/EpCost                 │ 6.960921287536621       │\n",
       "│ Metrics/EpLen                  │ 5.0                     │\n",
       "│ Train/Epoch                    │ 0.0                     │\n",
       "│ Train/Entropy                  │ 1.4189385175704956      │\n",
       "│ Train/KL                       │ 0.00026566203450784087  │\n",
       "│ Train/StopIter                 │ 1.0                     │\n",
       "│ Train/PolicyRatio/Mean         │ 1.0                     │\n",
       "│ Train/PolicyRatio/Min          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Max          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                     │\n",
       "│ Train/LR                       │ 0.0                     │\n",
       "│ Train/PolicyStd                │ 1.0                     │\n",
       "│ TotalEnvSteps                  │ 10.0                    │\n",
       "│ Loss/Loss_pi                   │ -2.9802322387695312e-08 │\n",
       "│ Loss/Loss_pi/Delta             │ -2.9802322387695312e-08 │\n",
       "│ Value/Adv                      │ 5.9604645663569045e-09  │\n",
       "│ Loss/Loss_reward_critic        │ 10.46424674987793       │\n",
       "│ Loss/Loss_reward_critic/Delta  │ 10.46424674987793       │\n",
       "│ Value/reward                   │ -0.017885426059365273   │\n",
       "│ Loss/Loss_cost_critic          │ 18.490144729614258      │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 18.490144729614258      │\n",
       "│ Value/cost                     │ 0.13730722665786743     │\n",
       "│ Time/Total                     │ 0.0326535701751709      │\n",
       "│ Time/Rollout                   │ 0.019308805465698242    │\n",
       "│ Time/Update                    │ 0.012392044067382812    │\n",
       "│ Time/Epoch                     │ 0.03173708915710449     │\n",
       "│ Time/FPS                       │ 315.0982360839844       │\n",
       "│ Env/Success_counts             │ 1.5                     │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                     │\n",
       "└────────────────────────────────┴─────────────────────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1mMetrics                       \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mValue                  \u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│ Metrics/EpRet                  │ 5.625942230224609       │\n",
       "│ Metrics/EpCost                 │ 6.960921287536621       │\n",
       "│ Metrics/EpLen                  │ 5.0                     │\n",
       "│ Train/Epoch                    │ 0.0                     │\n",
       "│ Train/Entropy                  │ 1.4189385175704956      │\n",
       "│ Train/KL                       │ 0.00026566203450784087  │\n",
       "│ Train/StopIter                 │ 1.0                     │\n",
       "│ Train/PolicyRatio/Mean         │ 1.0                     │\n",
       "│ Train/PolicyRatio/Min          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Max          │ 1.0                     │\n",
       "│ Train/PolicyRatio/Std          │ 0.0                     │\n",
       "│ Train/LR                       │ 0.0                     │\n",
       "│ Train/PolicyStd                │ 1.0                     │\n",
       "│ TotalEnvSteps                  │ 10.0                    │\n",
       "│ Loss/Loss_pi                   │ -2.9802322387695312e-08 │\n",
       "│ Loss/Loss_pi/Delta             │ -2.9802322387695312e-08 │\n",
       "│ Value/Adv                      │ 5.9604645663569045e-09  │\n",
       "│ Loss/Loss_reward_critic        │ 10.46424674987793       │\n",
       "│ Loss/Loss_reward_critic/Delta  │ 10.46424674987793       │\n",
       "│ Value/reward                   │ -0.017885426059365273   │\n",
       "│ Loss/Loss_cost_critic          │ 18.490144729614258      │\n",
       "│ Loss/Loss_cost_critic/Delta    │ 18.490144729614258      │\n",
       "│ Value/cost                     │ 0.13730722665786743     │\n",
       "│ Time/Total                     │ 0.0326535701751709      │\n",
       "│ Time/Rollout                   │ 0.019308805465698242    │\n",
       "│ Time/Update                    │ 0.012392044067382812    │\n",
       "│ Time/Epoch                     │ 0.03173708915710449     │\n",
       "│ Time/FPS                       │ 315.0982360839844       │\n",
       "│ Env/Success_counts             │ 1.5                     │\n",
       "│ Metrics/LagrangeMultiplier/Mea │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Min │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Max │ 0.0                     │\n",
       "│ Metrics/LagrangeMultiplier/Std │ 0.0                     │\n",
       "└────────────────────────────────┴─────────────────────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "(5.625942230224609, 6.960921287536621, 5.0)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "custom_cfgs.update({'train_cfgs': {'total_steps': 10}})\n",
    "agent = omnisafe.Agent('PPOLag', 'NewExample-v0', custom_cfgs=custom_cfgs)\n",
    "agent.learn()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nice! The above code has outputted the environment-specific information `Env/Success_counts` to the terminal. This process does not require any modifications to the original code.\n",
    "\n",
    "## Summary\n",
    "OmniSafe aims to become the foundational software for safe reinforcement learning. We will continue to refine the environmental interface standards of OmniSafe, enabling it to adapt to various safe reinforcement learning tasks and empower diverse safety scenarios."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "omnisafe",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
