VideoAlchemy: Open-set Personalization in Video Generation

Tsai-Shien Chen; Aliaksandr Siarohin; Willi Menapace; Yuwei Fang; Ivan Skorokhodov; Jun-Yan Zhu; Kfir Aberman; Ming-Hsuan Yang; Sergey Tulyakov

VideoAlchemy: Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Ivan Skorokhodov, Jun-Yan Zhu, Kfir Aberman, Ming-Hsuan Yang, Sergey Tulyakov

16 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: generative models, video generation, content personalization, content customization

Abstract: Video personalization methods allow us to synthesize videos with specific concepts such as people, pets, and places. However, existing methods often focus on limited domains, require time-consuming optimization per subject, or support only a single subject. We present $VideoAlchemy~-$ a video model equipped with built-in multi-subject, open-set personalization capabilities for both foreground objects and backgrounds, eliminating the need for time-consuming test-time optimization. Our model is built on a new Diffusion Transformer module that fuses each reference image conditioning and its corresponding subject-level text prompt with cross-attention layers. Developing such a large model presents two main challenges: $dataset$ and $evaluation$. First, as paired datasets of reference images and videos are extremely hard to collect, we opt to sample video frames as reference images and synthesize entire videos. This approach, however, introduces data biases issue, where models can easily denoise training videos but fail to generalize to new contexts during inference. To mitigate these issue, we carefully design a new automatic data construction pipeline with extensive image augmentation and sampling techniques. Second, evaluating open-set video personalization is a challenge in itself. To address this, we introduce a new personalization benchmark with evaluation protocols focusing on accurate subject fidelity assessment and accommodating different types of personalization conditioning. Finally, our extensive experiments show that our method significantly outperforms existing personalization methods, regarding quantitative and qualitative evaluations.

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1025

Loading