Character Mixing for Video Generation

Tingting Liao; Chongjian GE; Guangyi Liu; Hao Li; Yi Zhou

Character Mixing for Video Generation

Tingting Liao, Chongjian GE, Guangyi Liu, Hao Li, Yi Zhou

10 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: text-to-video generation; multi-character personalization

TL;DR: Our work explores multi-character text-to-video generation (e.g., mixing Tom and Jerry with Mr. Bean), preserving individual identities and personalized motions while enabling smooth, natural interactions.

Abstract: Imagine Mr. Bean stepping into Tom and Jerry---can we generate videos where characters interact naturally across different worlds? We study inter-character interaction in text-to-video generation, where the key challenge is to preserve each character’s identity and behaviors while enabling coherent cross-context interaction. This is difficult because characters may never have coexisted and because mixing styles often causes **style delusion**, where realistic characters appear cartoonish or vice versa. We introduce a framework that tackles these issues with Cross-Character Embedding (CCE), which learns identity and behavioral logic across multimodal sources, and Cross-Character Augmentation (CCA), which enriches training with synthetic co-existence and mixed-style data. Together, these techniques allow natural interactions between previously uncoexistent characters without losing stylistic fidelity. Experiments on a curated benchmark of cartoons and live-action series with 10 characters show clear improvements in identity preservation, interaction quality, and robustness to style delusion, enabling new forms of generative storytelling. Our project page https://mi-mi-x.github.io/.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 3735

Loading