Subject-driven Video Generation Emerges from Experience Replays

17 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: video generation, customization, personalization, diffusion models, continual learning
TL;DR: We employ continual learning with video replay, and adjust replay ratio dynamically, achieving on-par subject fidelity and motion with lower compute compared to state-of-the-art models.
Abstract: We aim to enable efficient subject-to-video (S2V) learning, which otherwise requires expensive video-subject-pair datasets that require tens of thousands of GPU hours for training. While utilizing image-paired datasets to train video models could address this challenge, naively training with image pairs results in catastrophic loss of temporal ability due to gradient conflicts. We hypothesize that S2V generation decomposes into two orthogonal objectives of identity learning from images and temporal dynamics from videos. Based on this orthogonality assumption, we design a stochastic task-switching strategy that predominantly samples from image datasets while maintaining minimal video replay for temporal coherence. Our experiments validate this hypothesis by demonstrating that the gradient inner product between tasks converges exponentially to near-zero, confirming emergent orthogonalization without requiring explicit orthogonal projection. This validated orthogonality enables efficient image-dominant training while preventing catastrophic forgetting through proxy experience replay. We employ regularization techniques including random frame selection and token dropping during video replay to ensure efficient temporal learning. Extensive experiments demonstrate our approach achieves superior performance with comparable compute to per-subject tuned methods for single subjects, while providing zero-shot capability and outperforming both per-subject tuned methods and some existing zero-shot approaches.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 8758
Loading