Videos for Figure 5 and 6
On this page, we present the videos corresponding to Figures 5 and 6. As shown in the video for Figure 5, our method produces high-quality videos without blurry hands or finger distortion and maintains a consistent background. In contrast, S2G and MYA exhibit inconsistent backgrounds and suffer from blurry hands and distorted fingers. Additionally, MYA often memorizes appearance features during training. This causes the generated videos to replicate the memorized appearance instead of using the reference image, resulting in inconsistencies. More comparison videos are provided on the "More Videos for Comparisons" page.
In the video for Figure 6, the incomplete model versions suffer from low visual quality, background inconsistencies with the reference image, distorted hands, extra fingers, and hands that appear detached from the body. Moreover, the generated videos show significant motion inconsistencies, with severe motion shaking. Additional videos for the ablation studies are available on the "More Videos for Ablation Studies" page.
Please ensure to play the audio in each video to hear the input speech.
GT S2G MYA Ours Figure 5 |
---|
W/o Ref W/o Motion W/o First Stage W/o Slow-Fast Ours Figure 6 |