Prompt Relay: Inference-Time Temporal Prompt Routing for Multi-Event Video Generation

Published: 27 Apr 2026, Last Modified: 27 Apr 2026J2A PosterEveryoneRevisionsCC BY 4.0
Keywords: video generation, diffusion models, temporal control, multi-event generation, attention, inference-time control
Paper Track: Extended Abstract (non-archival)
TL;DR: We introduce Prompt Relay, an inference-time, plug-and-play attention routing method that enables movie-grade, multi-event video generation
Abstract: Video diffusion models have achieved remarkable progress in generating high-quality videos. However, these models struggle to represent the temporal succession of multiple events in real-world videos and lack explicit mechanisms to control when semantic concepts appear, how long they persist, and the order in which multiple events occur. Such control is especially important for movie-grade synthesis, where coherent storytelling depends on precise timing, duration, and transitions between events. When using a single paragraph-style prompt to describe a sequence of complex events, models often exhibit temporal entanglement, where semantics intended for different moments interfere with one another, resulting in poor text-video alignment. To address these limitations, we propose Prompt Relay, an inference-time method to enable fine-grained temporal control in multi-event video generation. We apply a penalty in the cross-attention mechanism to regulate how each query attends to keys intended for different moments in the video. This significantly improves temporal prompt alignment, reduces semantic interference and improves visual quality.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 7
Loading