|
We provide video comparison results of our method with other methods in the paper. For LTX-Video, we follow the diffuser example to generation samples using the same prompt. |
|||
| LTX-Video[1] | CogVideoX-2B[2] | Wan2.1-1.3B[3] | Ours |
|---|---|---|---|
|
Prompt:
3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant,
enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and
a bush.
|
|||
|
Prompt:
A cat sitting at a grand piano, elegantly playing a classical piece with its paws.
|
|||
|
Prompt:
A corgi vlogging itself in tropical Maui.
|
|||
|
Prompt:
A movie trailer featuring the adventures of the 30-year-old spaceman wearing a red wool knitted
motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
|
|||
|
Prompt:
A pair of lovebirds preening each other's feathers.
|
|||
|
Prompt:
A skeleton wearing a flower hat and sunglasses dances in the wild at sunset.
|
|||
|
In this section, we provide more results of our model. |
||
|
Below, we provide more results of our mobile model. |
||||
[1] Yoav HaCohen et al. "LTX-Video: Realtime Video Latent Diffusion." https://arxiv.org/abs/2501.00103 (2024).
[2] Yang, Zhuoyi, et al. "CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer." ICLR (2025).
[3] Team Wan, et al. "Wan: Open and Advanced Large-Scale Video Generative Models." https://arxiv.org/abs/2503.20314 (2025)