MoViE: Mobile Diffusion for Video Editing

Anonymous Authors

Abstract

Recent progress in diffusion-based video editing techniques has shown remarkable potential and is being increasingly utilized in practical applications. However, these methods remain prohibitively expensive and particularly challenging to deploy on mobile devices. In this study, we introduce a series of optimizations that render mobile video editing feasible. Building upon the existing image editing model, we first optimize its architecture and incorporate a lightweight autoencoder. Subsequently, we propose a new classifier-free guidance distillation with multiple modalities, resulting in a on-device speed-up. Finally, we reduce the number of sampling steps to one (10× speed-up) by introducing a novel adversarial distillation scheme which preserves the controllability of the editing process in contrast to previous arts. Collectively, these optimizations enable video editing at an impressive 12 frames per second on mobile devices, while maintaining high editing quality.

Qualitative Results on Faces Dataset

Input

"In chinese ink style"

"In caricature style"

"In pop art style"

Input

"Turn him into silver surfer"

"Add wrinkles"

"Add sunglasses"

Input

"In pixar 3d style"

"Turn him into vampire"

"In pencil drawing style"

Input

"Make him bronze"

"Turn him into hulk"

"In Minecraft style"

Comparison to the Base Model on Faces Dataset

Input Video

Input

Base Model

"In Monet style"

MoViE

"In Monet style"

Input

"Make him wooden"

"Make him wooden"

Comparison to SOTA Methods on DAVIS

Input Video

Input

InsV2V

"Make it desert"

Rerender-a-Video

"Make it desert"

TokenFlow

"Make it desert"

MoViE

"Make it desert"

Input

"Turn the swan into flamingo"

"Turn the swan into flamingo"

"Turn the swan into flamingo"

"Turn the swan into flamingo"

Input

"Add grass"

"Add grass"

"Add grass"

"Add grass"

Input

"Add snow"

"Add snow"

"Add snow"

"Add snow"

Input

"Make him zombie"

"Make him zombie"

"Make him zombie"

"Make him zombie"

Input

"Make him yeti"

"Make him yeti"

"Make him yeti"

"Make him yeti"

Input

"Make her hair blonde"

"Make her hair blonde"

"Make her hair blonde"

"Make her hair blonde"

Input

"Add fire"

"Add fire"

"Add fire"

"Add fire"