# Description for Supplementary Material

Videoes show the performance of physical and simulate robots in the **Experiments** section (**section 3**) of the paper. Files named with `sim` are the simulation results and the rest are the results of real robots.

## Training Effect Comparison Experiment

This part corresponds to video 0, 1, 2, 3:

- video 0 shows untrained performance of the robot;
- video 1 shows the robot performance after training by DS;
- video 2 shows the robot performance after training by PPO;
- video 3 shows the performance when model encounters a sudden drop of reward during training by DS.

Results shown as `Figure 5` in the **section 3.1 - results** of the paper.

## Robustness Test Experiment

This part corresponds to video 1, 4, 5, 6, 7:

- video 1 shows the robot performance after training by DS, with complete observations;
- video 4 shows the robot performance after training by DS, with the absent of gravity $G$;
- video 5 shows the robot performance after training by DS, with the absent of joint positions $q$;
- video 6 shows the robot performance after training by DS, with the absent of last action $a_{t-1}$
- video 7 shows the robot performance after training by DS, with the absent of joint positions $q$ and last action $a_{t-1}$.

Results shown as `Figure 6` and `Table 3` in the **section 3.2 - results** of the paper.
