Lightning Video: Building Compact Diffusion Transformers for High-Fidelity On-Device Video Generation

12 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video Generation; Mobile Devices
Abstract: Recent advances in Diffusion Transformers (DiTs) have enabled the generation of highly realistic video content, but state-of-the-art models often require billions of parameters, making them impractical for deployment on resource-constrained edge devices, such as smartphones. In this work, we introduce a systematic approach to designing lightweight yet powerful video DiTs tailored for edge scenarios. Our framework centers on three key components: (1) a Taylor-expansion–based pruning initialization that allows flexible model rescaling and rapid capability recovery with limited data; (2) a staged, data-efficient training protocol that couples this initialization with curated datasets and targeted optimization schedules; and (3) a distribution-matching distillation strategy that substantially reduces inference steps while preserving generation quality. We present **Lightning Video**, a 0.8B-parameter model that achieves competitive performance against billion-scale baselines while supporting native execution on edge devices (e.g., iPhone 16 Pro). These results demonstrate the feasibility of delivering high-quality video generation directly on end-user devices, opening new opportunities for practical mobile and creative applications.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 4470
Loading