Keywords: Diffusion Model, Video edit
Abstract: Recent works in Text-to-Image (T2I) models have shown potential in addressing text-driven video editing using latent diffusion models (LDM). However, text prompts as a representation of visual signals remain a crude abstraction, leaving the challenge of achieving fine-grained and controllable video editing unresolved. In this study, we introduce the Latent prompt based Image-driven Video Editing (LIVE) framework to unlock the capabilities of pretrained LDM for precise editing control. At the heart of LIVE lies a novel Latent Prompt Mechanism, which utilizes latent code from a reference image as a prompt to enrich visual details. We begin by revisiting the attention mechanisms in LDM and enhancing them to facilitate comprehensive interactions between video frames and latent prompts in both spatial and temporal dimensions. We also devise a training process to fine-tune components such as latent prompts, textual embeddings, and LDM parameters, effectively representing the provided video and image within the diffusion space. Subsequently, these optimized elements are combined to generate the edited video output, enabling seamless object substitution in each frame with user-specified targets while maintaining visual consistency across frames. Our experiments on real-world videos demonstrate the efficacy of the LIVE framework and its promising applications in image-driven video editing tasks.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4410
Loading