Code to reproduce experiment results for NeurIPS submission:

Distributional Reinforcement Learning for Risk-Sensitive Policies

===============================================

Hardware requirements:

Multicore CPU and 1 GPU (We use Nvidia V100 GPUs from a cluster)
32GB RAM


Software requirements:

Please refer to the file requirements.txt (for Conda)


===============================================

Section 4.1 Synthetic Data

To produce plots in Figure 2,
 1. Run cvar_dynamic_synthetic.py and cvar_static_synthetic2.py to train models
    and save results (in a ./res folder)
    (gpu and/or cpu parallerization can help speed up)

 2. Run synthetic_plots.py (in ipython) to generate the plots

===============================================

Section 4.2 Option Trading

To produce plots in Figure 3,
 1. Run cvar_dynamic_option.py and cvar_static_option2.py to train models
    and save results (in a ./res folder)
    (gpu and/or cpu parallerization can help speed up)

 2. Run res_option_all.py (in ipython) to generate Figure 3 (Left plot)
 3. Run res_option_eval_real3.py (in ipython) to evaluate models and plot results (Figure 3 Right)

===============================================

Section 4.3 Atari Games

To produce plots in Figure 4,
 1. Download pfrl source code (release v0.3.0) from github:
     https://github.com/pfnet/pfrl/releases/tag/v0.3.0
     (please check requirements.txt of pfrl for potential additional dependencies)
 2. Install the atari gym environment (we used atari-py)
 3. Extract pfrl subfolder to current folder.
 4. Replace the following files with the files in the provided "pfrl_patch" folder:
     pfrl/replay_buffer.py
     pfrl/agents/iqn.py
     pfrl/experiments/evaluator.py
     pfrl/wrappers/cast_observation.py
 5. To train policies (replace seed 0 with 1, 2, ...) (WARNING: it may take a few days to finish)
      >python pf_train_iqn_atari.py --seed 0 --steps 30000000 --env AsterixNoFrameskip-v4 --cvar-alpha 1.00
      >python pf_train_iqn_atari.py --seed 0 --steps 30000000 --env AsterixNoFrameskip-v4 --cvar-alpha 0.25
      >python pf_train_iqn_atari.py --seed 0 --steps 30000000 --env AsterixNoFrameskip-v4 --cvar-alpha 0.25 --cvar-static
   (the --cvar-static version uses our proposed approach)

 6. To evaluate policies, run in ipython (replace <seed> with 0, 1, 2, ...):
      >run pf_eval_iqn_atari Asterix <seed> 1.0 0
      >run pf_eval_iqn_atari Asterix <seed> 0.25 0
      >run pf_eval_iqn_atari Asterix <seed> 0.25 1
   (the "0.25 1" is the cvar-static version)


===============================================

Appendix A & B

To produce Figures 6-9, run finite_env.py in ipython.

===============================================

Appendix C Modified Puddle World

1. Train and save models:
   -- for Expectation (replace <seed> with 1, 2, 3,...)
   > python puddle_env2.py dqr <seed> 100 0.02 50000 0
   
   -- for Dynamic (replace <seed> with 1, 2, 3,...)
   > python puddle_env2.py dqr <seed> 20 0.02 50000 0
   
   -- for Static (replace <seed> with 1, 2, 3,...)
   > python puddle_env2.py dqr4d <seed> 20 0.02 50000 0
   
2. Evaluate models and save results:
   -- for Expectation (replace <seed> with 1, 2, 3,...)
   > python puddle_env2.py dqr <seed> 100 0.02 50000 1
   
   -- for Dynamic (replace <seed> with 1, 2, 3,...)
   > python puddle_env2.py dqr <seed> 20 0.02 50000 1
   
   -- for Static (replace <seed> with 1, 2, 3,...)
   > python puddle_env2.py dqr4d <seed> 20 0.02 50000 1


3. To produce Figure 10, run res_puddle2.py in ipython.


4. To plot Figure 11, run in ipython:
   > run puddle_env2 dqr 1 100 0.02 50000 2
   > run puddle_env2 dqr 2 100 0.02 50000 2
   > run puddle_env2 dqr 3 100 0.02 50000 2

5. To plot Figure 12, run in ipython:
   > run puddle_env2 dqr 1 20 0.02 50000 2
   > run puddle_env2 dqr 2 20 0.02 50000 2
   > run puddle_env2 dqr 3 20 0.02 50000 2

6. To plot Figure 13, run in ipython:
   > run puddle_env2 dqr4d 1 20 0.02 50000 2
   > run puddle_env2 dqr4d 2 20 0.02 50000 2
   > run puddle_env2 dqr4d 3 20 0.02 50000 2

===============================================

Appendix D Lunar Lander with Noisy Observation

To produce plots in Figure 14,
 1. Follow steps 1-4 in the Atari section above
 2. To train policies (replace seed 0 with 1, 2, ...)
      >python pf_train_iqn_gym.py --seed 0 --steps 1000000 --env LunarLander-v2 --observation-noise --cvar-alpha 1.00
      >python pf_train_iqn_gym.py --seed 0 --steps 1000000 --env LunarLander-v2 --observation-noise --cvar-alpha 0.25
      >python pf_train_iqn_gym.py --seed 0 --steps 1000000 --env LunarLander-v2 --observation-noise --cvar-alpha 0.25 --cvar-static
   (the --cvar-static version uses our proposed approach)

 3. To evaluate policies, run in ipython (replace <seed> with 0, 1, 2, ...):
      >run pf_eval_iqn_gym LunarLander <seed> 1.0 0
      >run pf_eval_iqn_gym LunarLander <seed> 0.25 0
      >run pf_eval_iqn_gym LunarLander <seed> 0.25 1
   (the "0.25 1" is the cvar-static version)

