Training-Free Efficient Video Generation via Dynamic Token Carving
Supplementary Materials
Video are compressed with ffmpeg to reduce the size of the files.
More Showcases
Hover on the video to see corresponding text prompts
Jenga+HunyuanI2V / 338s
Jenga+HunyuanI2V / 338s
Jenga+HunyuanI2V / 338s
Jenga+HunyuanI2V / 338s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga+AccVideo / 76s
Jenga+AccVideo / 76s
Jenga+AccVideo / 76s
Jenga+Wan2.1-1.3B / 24s
Jenga+Wan2.1-1.3B / 24s
Jenga+Wan2.1-1.3B / 24s
Comparisons
The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it's tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
A movie trailer featuring the adventures of the 30 year old space man wearing a red knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
Ablation Study
1. Effect of different text-attention amplification bias values that affect field of views
negative bias
zero bias
low bias
mid bias
high bias
w/o bias: abnormal FOV (360P first stage)
original 1 stage (720P first stage)
bias with 2 stages (540P first stage)
bias with 3 stages (360P first stage)
w/o bias: abnormal FOV (360P first stage)
original 1 stage (720P first stage)
bias with 2 stages (540P first stage)
bias with 3 stages (360P first stage)
2. Effectiveness of the Adjacency Mask
w/o adjacency mask
with adjacency mask
w/o adjacency mask
with adjacency mask
Limitation Analysis
Please hover on the video to see the text prompt
Main failure case: latent misalignment when resizing
A. hand has wrong content
A. with enhanced prompt
B. boundary misalignment
B. clear boundary with enhanced prompt
Alternative Solution: Use enhanced prompts / Generate contents with complex scene & textures
Based on enhanced prompts, we can eliminate the quality degradation of the generated video, with a much smaller inital resolution (360P).
dynamic scene
contents with detailed textures
static scene with enhanced prompt
complex scene with enhanced prompt