Ours (top row) balances character consistency and natural motion.
VideoCrafter2 (second row) shows diverse motion but inconsistent characters.
Full Q Preservation (third row) directly injects Q tokens from the vanilla model without flow-based processing, preserving original motion but losing character consistency.
No Q Intervention (bottom row) maintains strong character consistency but suffers from motion degradation and synchronization across shots.