V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: video transitions, video style, transition recommendation, video editing
Abstract: The exponential growth of digital video content, spanning professional studios, independent creators, and social media platforms has heightened the need for rapid, flexible, and quality video editing methods. Visual transitions are crucial storytelling tools that connect clips, signal shifts in time or mood, and establish stylistic consistency across production styles like cinematic film, documentary, vlog, or animation. Yet crafting transitions that balance narrative flow and style is complex and manual, often requiring expert skill, while existing tools lack the content-awareness and flexibility to support diverse creative needs. We introduce V-Trans4Style, a novel learning-based framework that automates visual transition recommendation by integrating content and production style awareness. At its core is a transformer-based encoder-decoder network trained on a large annotated video dataset to learn temporally and visually consistent transitions from video sequences. Style adaptation is achieved via a separate style conditioning module, which operates at inference to iteratively refine the trained encoder’s latent representations and align transitions obtained from the decoder with user-specified styles from cinematic to vlog without retraining. This two-stage, bottom-up architecture ensures that the recommended transitions reinforce both the natural flow of the video and its intended visual identity, moving beyond static or domain-specific solutions. To support training and evaluation, we release AutoTransition++, a dataset of 6,000 videos spanning five production styles, with over 1,300 style-verified samples. Empirical tests reveal that V-Trans4Style improves transition recall and ranking by up to 80%, and boosts style similarity by 12% compared to baselines. A user study with 102 participants confirms the effectiveness of style conditioning, with over 70% preferring videos refined with it as better matching the intended style. We hope our work lays a foundation for further exploration and deeper understanding of video production styles and their interplay with editing elements, enabling richer and more personalized storytelling experiences.
Submission Number: 362
Loading