DC-VideoGen

DC-VideoGen: Efficient Video Generation with
Deep Compression Video Autoencoder

High Resolution Videos Generated by Our DC-VideoGen-Wan

A corgi perched on a branch,
tensed and ready to leap to the ground.

An astronaut and a knight
embrace in a desolate landscape.

An astronaut rides a horse
across the moon towards Earth.

I2V 720P

A cinematic video of the text 'DC-VideoGen' formed by clouds.

A poised blonde woman gracefully sips tea from a delicate cup.

I2V 1080P

A time-lapse video captures a flower's delicate and detailed bloom.

T2V 2160P

Video Autoencoder Reconstruction Visualization

Input
Shape: 80x256x256

LTX Video VAE (Causal)
Configuration: f32t8c128
Compression Ratio: 192
PSNR: 31.12

Video DC-AE (Non-Causal) w/o tiling
Configuration: f32t4c128
Compression Ratio: 96
PSNR: 31.52

Video DC-AE (Non-Causal) w/ tiling
Configuration: f32t4c128
Compression Ratio: 96
PSNR: 33.65

DC-AE-V (Chunk-Causal)
Configuration: f32t4c32
Compression Ratio: 192
PSNR: 32.72

Image-to-Video (I2V) Visualization

Wan2.1-I2V-14B
(27.88 mins/video)

DC-VideoGen-Wan2.1-I2V-14B
(3.67 mins/video)

Wan2.1-I2V-14B
(27.88 mins/video)

DC-VideoGen-Wan2.1-I2V-14B
(3.67 mins/video)

Prompt: A battle-scarred robot walks through a desolate city ruin.

Prompt: A trail runner sprints through a sun-dappled forest, face set with determination.

Prompt: An eaglet soars high above a vast, vibrant forest canopy.

Prompt: A rugged off-road vehicle speeds through a sunlit forest track.

Text-to-Video (T2V) Visualization

Wan2.1-T2V-14B
(27.52 mins/video)

DC-VideoGen-Wan2.1-T2V-14B
(3.58 mins/video)

Wan2.1-T2V-14B
(27.52 mins/video)

DC-VideoGen-Wan2.1-T2V-14B
(3.58 mins/video)

Prompt: A girl on a ship's deck, clutching a letter, looks back with sad determination.

Prompt: Minecraft with the most gorgeous high res 8k texture pack ever.

Prompt: Three video game characters team up in a vibrant arcade.

Prompt: A man is skiing down thick layers of clouds. Towering mountain peaks are faintly visible.