4EV: Adaptive Video Editing With Spatial Temporal Dynamics and Motion Pathways

Namrata Patel, Lakshmi Priya Ramisetty, Aditya Singh Parmar, Jialu Li, Youshan Zhang

Published: 01 Jan 2025, Last Modified: 26 Mar 2026IEEE AccessEveryoneRevisionsCC BY-SA 4.0
Abstract: This paper presents 4EV, a novel end-to-end video editing model extending Stable Diffusion and T2I (Text-to-Image) frameworks to generate and edit videos with fine-grained motion dynamics, including object movement, path navigation, background transitions, and zoom effects. 4EV uses spatial-temporal attention for smooth transitions and consistent object appearances, with attention map injection for precise alignment. Contributions include: 1) Motion4EV, a custom dataset of videos and prompts supporting precise motion control, generated through experiments on various motion dynamics to ensure diversity; 2) an enhanced 4EV architecture with attention blocks for improved temporal and spatial refinement; and 3) experimental results demonstrating superior CLIP scores and motion consistency compared to state-of-the-art models. These innovations establish 4EV as a scalable framework for advanced video generation and editing. The source code is available at https://github.com/npp058/videogen
Loading