Here we show some of the qualitative results for the comparison in Section 5.2 of the paper. For our method, EquiVDM-base generates videos from warped noise without using dense conditioning, while EquiVDM-full has the additional dense conditioning (soft-edge map). All the other compared methods use dense conditioning (soft-edge map). Click on the video to view video in full resolution.
Here we show qualitative results for few-step video generation comparing VACE (1.3B) and VACE-EquiVDM (1.3B) methods. The videos demonstrate the quality of generated content with different numbers of denoising steps (3, 5, and 10 steps). Click on the video to view video in full resolution
| 3-step VACE | 5-step VACE | 10-step VACE | 3-step VACE-EquiVDM | 5-step VACE-EquiVDM | 10-step VACE-EquiVDM |
|---|---|---|---|---|---|
In the following, we illustrate the i.i.d. property of each frame of the input warped noise. On the left, we present a zoomed-in warp of the first noise frame. On the right, we show the covariance matrices computed over a 10×10 window at the center of each frame. As shown, the covariance matrices are close to the identity matrix, indicating that the noise in each frame is approximately independent after warping.