Training Curves
===============

Safe reinforcement learning algorithms are designed to achieve high reward while satisfying the safety constraint.
In this section, we evaluate the performance of SafePO's algorithms on the various environments in `Safety-Gymnasium <https://github.com/PKU-Alignment/safety-gymnasium>`_.

Single-Agent
------------

First order
~~~~~~~~~~~

.. tab-set::

    .. tab-item:: CUP

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/CUP-Training-Curves--Vmlldzo1MTgxOTcx" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: CPPOPID

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/CPPO-PID-Training-Curves--Vmlldzo1MTgyMDE2" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: FOCOPS

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/FOCOPS-Training-Curves--Vmlldzo1MTgyMDE0" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: PPO-Lag

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/PPO-Lag-Training-Curves--Vmlldzo1MTgyMDE4" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>
        
Second order
~~~~~~~~~~~~

.. tab-set::

    .. tab-item:: CPO

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/CPO-Training-Curves--Vmlldzo1MTgxOTY4" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: PCPO

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/PCPO-Training-Curves--Vmlldzo1MTgxOTY1" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: RCPO

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/RCPO-Training-Curves--Vmlldzo1MTgxOTU5" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: TRPO-Lag

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/TRPO-Lag-Training-Curves--Vmlldzo1MTgyMDI2" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

Multi-Agent
-----------

.. tab-set::

    .. tab-item:: HAPPO

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/HAPPO-Training-Curves--Vmlldzo1MTgxOTQ2" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: MACPO

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/MACPO-Training-Curves--Vmlldzo1MTgxOTU1" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: MAPPO

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/MAPPO-Training-Curves--Vmlldzo1MTgxOTQx" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>

    .. tab-item:: MAPPO-Lag

      .. raw:: html

         <iframe src="https://wandb.ai/pku_rl/SafePO/reports/MAPPO-Lag-Training-Curves--Vmlldzo1MTgxOTU0" style="border:none;width:90%; height:1000px" >

      .. raw:: html

         </iframe>