Table of Contents

Video Comparison

Here we show some of the qualitative results for the comparison in Section 5.2 of the paper. For our method, EquiVDM-base generates videos from warped noise without using dense conditioning, while EquiVDM-full has the additional dense conditioning (soft-edge map). All the other compared methods use dense conditioning (soft-edge map). Click on the video to view video in full resolution.

CtrlVid T2V-Zero CtrlAdapter EquiVDM-base EquiVDM-full Ground Truth Noise

Few-Step Video Generation

Here we show qualitative results for few-step video generation comparing VACE (1.3B) and VACE-EquiVDM (1.3B) methods. The videos demonstrate the quality of generated content with different numbers of denoising steps (3, 5, and 10 steps). Click on the video to view video in full resolution

3-step VACE 5-step VACE 10-step VACE 3-step VACE-EquiVDM 5-step VACE-EquiVDM 10-step VACE-EquiVDM

Gaussian Characteristics of Input Noise After Warping

In the following, we illustrate the i.i.d. property of each frame of the input warped noise. On the left, we present a zoomed-in warp of the first noise frame. On the right, we show the covariance matrices computed over a 10×10 window at the center of each frame. As shown, the covariance matrices are close to the identity matrix, indicating that the noise in each frame is approximately independent after warping.

Covariance of noise distribution after warping