DRL Agents
===============================


.. toctree::
  :hidden:

  Agent <drl/basic_drl_class>
  DQN_Agent <drl/dqn>
  C51_Agent <drl/c51>
  DDQN_Agent <drl/ddqn>
  DuelDQN_Agent <drl/dueldqn>
  NoisyDQN_Agent <drl/noisydqn>
  PerDQN_Agent <drl/perdqn>
  QRDQN_Agent <drl/qrdqn>
  PG_Agent <drl/pg>
  PPG_Agent <drl/ppg>
  PPOCLIP_Agent <drl/ppo_clip>
  PPOCKL_Agent <drl/ppo_kl>
  PDQN_Agent <drl/pdqn>
  SPDQN_Agent <drl/spdqn>
  MPDQN_Agent <drl/mpdqn>
  A2C_Agent <drl/a2c>
  SAC_Agent <drl/sac>
  SACDIS_Agent <drl/sac_dis>
  DDPG_Agent <drl/ddpg>
  TD3_Agent <drl/td3>


.. list-table:: 
   :header-rows: 1

   * - Agent
     - PyTorch 
     - TensorFlow
     - MindSpore
   * - :doc:`DQN <drl/dqn>`: Deep Q-Networks
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`C51DQN <drl/c51>`: Distributional Reinforcement Learning
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`Double DQN <drl/ddqn>`: DQN with Double Q-learning
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`Dueling DQN <drl/dueldqn>`: DQN with Dueling network
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`Noisy DQN <drl/noisydqn>`: DQN with Parameter Space Noise
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PERDQN <drl/perdqn>`: DQN with Prioritized Experience Replay
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`QRDQN <drl/qrdqn>`: DQN with Quantile Regression
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`VPG <drl/pg>`: Vanilla Policy Gradient
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PPG <drl/ppg>`: Phasic Policy Gradient
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PPO <drl/ppo_clip>`: Proximal Policy Optimization
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PDQN <drl/pdqn>`: Parameterised DQN
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`SPDQN <drl/spdqn>`: Split PDQN
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`MPDQN <drl/mpdqn>`: Multi-pass PDQN
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`A2C <drl/a2c>`: Advantage Actor Critic
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`SAC <drl/sac>`: Soft Actor-Critic
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`SAC-Dis <drl/sac_dis>`: SAC for Discrete Actions
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`DDPG <drl/ddpg>`: Deep Deterministic Policy Gradient
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`TD3 <drl/td3>`: Twin Delayed DDPG
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`