Abstract: Recent diffusion models have made significant advancements in generating lifelike videos from driving signals, including a reference character and a skeleton sequence. Nevertheless, these models often struggle with maintaining fidelity, as the generated results frequently deviate in character features, e.g., appearance and identity from the reference. We attribute this issue to the use of driving signals from the same individual during the training process, which biases the model towards skeleton-based shape features and limits its capacity to fully exploit character-specific information, and propose DreamHA to address this issue. DreamHA incorporates diffusion models with Rigid Transformation Augmentation (RTAug), a simple yet effective technique to perturb the shape characteristics of training data, improving the capability of diffusion models in capturing basic appearance features. Additionally, we introduce Identity Keeper (IK) to provide fine-grained facial control and enhance identity consistency. Extensive experimental results demonstrate that our method outperforms state-of-the-art approaches, producing more faithful and consistent animations.
Loading