The supplementary materials includes 1) source codes, 2) text files about how to build an execution environment and commands to reproduce each experiments and 3) some gif files visualizing the learned policy for each tasks after 500K environment steps.