This code is based on the original implementation (https://github.com/sfujim/TD3)
of TD3 (https://arxiv.org/abs/1802.09477).

Supplementary to ICLR 2022 Submission:
                
        Robust and Data-efficient Q-learning by Composite Value-estimation.
  
Requirements: pytorch (1.1.0), gym (0.12.1), mujoco_py (2.0.2.2), MuJoCo, GPU with CUDA support

For TD3:            python td3.py -e GYM_ENVIRONMENT [-s SEED]
For Composite TD3:  python composite_td3.py -e GYM_ENVIRONMENT [-s SEED]
For TD3(Delta):     python td3_delta.py -e GYM_ENVIRONMENT [-s SEED]

For the motivation: python motivation.py