Our method achieves the best balance of motion quality and identity consistency across shots.
Ours (top row) demonstrates improved motion richness and identity preservation.
VideoCrafter2 (second row) shows diverse motion but inconsistent characters.
ConsiS (third row), a naive implementation of ConsiStory, exhibits impaired identity consistency and motion artifacts.
ConsiS +Uncond (fourth row) adds feature injection to the unconditional denoising, resolving motion artifacts but resulting in reduced motion magnitude and compromised identity.
Q ConsiS (fifth row) couples each frame with a single frame in an anchor video, allowing for some natural motion, though partially synchronized, with improved identity preservation.