In order to better illustrate the superiority of our approach compared to others, we provide the videos corresponding to Figure 5 in the paper.  The ''example1'' in the name of videos refers to the videos generated by the first text in Figure 5 (i.e., ''A figure dances and jumps excitedly, spreading joy and happiness all around.''). The ''example2'' in the name of videos refers to the video generated by the second text in Figure 5 (i.e., ''A person angrily paces around while thrusting their arms outward and upwards, expressing their intense anger.'').