ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation

Panwang Pan; Jingjing Zhao; Yuchen Lin; Chenguo Lin; Chenxin Li; Honglei Yan; Tingting Shen; Yadong MU

ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation

Panwang Pan, Jingjing Zhao, Yuchen Lin, Chenguo Lin, Chenxin Li, Honglei Yan, Tingting Shen, Yadong MU

08 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Subject Video Generation, Reinforcement Learning for Generation, Semantic Understanding for Generation

Abstract: Video generative models pretrained on large-scale datasets can produce high-quality videos, but are often conditioned on text or a single image, limiting controllability and applicability. We introduce ID-Composer, a novel framework that addresses this gap by tackling multi-subject video generation from a text prompt and reference images. This task is challenging as it requires preserving subject identities, integrating semantics across subjects and modalities, and maintaining temporal consistency. To faithfully preserve the subject consistency and textual information in synthesized videos, ID-Composer~designs a **hierarchical identity-preserving attention mechanism**, which effectively aggregates features within and across subjects and modalities. To effectively allow for the semantic following of user intention, we introduce **semantic understanding via pretrained vision-language model (VLM)**, leveraging VLM's superior semantic understanding to provide fine-grained guidance and capture complex interactions between multiple subjects. Considering that standard diffusion loss often fails in aligning the critical concepts like subject ID, we employ an **online reinforcement learning phase** to drive the overall training objective of ID-Composer into RLVR. Extensive experiments demonstrate that our model surpasses existing methods in identity preservation, temporal consistency, and video quality. Code and training data will be released.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 3057

Loading