Single-Agent Learner
============================================

.. toctree::
  :hidden:

  DQN_Learner <drl/dqn>
  C51_Learner <drl/c51>
  DDQN_Learner <drl/ddqn>
  DuelDQN_Learner <drl/dueldqn>
  NoisyDQN_Learner <drl/noisydqn>
  PerDQN_Learner <drl/perdqn>
  QRDQN_Learner <drl/qrdqn>
  PG_Learner <drl/pg>
  PPG_Learner <drl/ppg>
  PPOCLIP_Learner <drl/ppo_clip>
  PPOCKL_Learner <drl/ppo_kl>
  PDQN_Learner <drl/pdqn>
  SPDQN_Learner <drl/spdqn>
  MPDQN_Learner <drl/mpdqn>
  A2C_Learner <drl/a2c>
  SAC_Learner <drl/sac>
  SACDIS_Learner <drl/sac_dis>
  DDPG_Learner <drl/ddpg>
  TD3_Learner <drl/td3>


.. list-table:: 
   :header-rows: 1

   * - Learner
     - PyTorch 
     - TensorFlow
     - MindSpore
   * - :doc:`DQN <drl/dqn>`: Deep Q-Networks
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`C51DQN <drl/c51>`: Distributional Reinforcement Learning
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`Double DQN <drl/ddqn>`: DQN with Double Q-learning
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`Dueling DQN <drl/dueldqn>`: DQN with Dueling network
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`Noisy DQN <drl/noisydqn>`: DQN with Parameter Space Noise
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PERDQN <drl/perdqn>`: DQN with Prioritized Experience Replay
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`QRDQN <drl/qrdqn>`: DQN with Quantile Regression
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`VPG <drl/pg>`: Vanilla Policy Gradient
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PPG <drl/ppg>`: Phasic Policy Gradient
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PPO <drl/ppo_clip>`: Proximal Policy Optimization
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`PDQN <drl/pdqn>`: Parameterised DQN
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`SPDQN <drl/spdqn>`: Split PDQN
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`MPDQN <drl/mpdqn>`: Multi-pass PDQN
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`A2C <drl/a2c>`: Advantage Actor Critic
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`SAC <drl/sac>`: Soft Actor-Critic
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`SAC-Dis <drl/sac_dis>`: SAC for Discrete Actions
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`DDPG <drl/ddpg>`: Deep Deterministic Policy Gradient
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
   * - :doc:`TD3 <drl/td3>`: Twin Delayed DDPG
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`
     - .. centered:: :math:`\checkmark`