SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation

Published: 31 Mar 2025, Last Modified: 31 Mar 2025SyntaGen 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Avatars, Single Image to 3D, Synthetic Data Generation
TL;DR: We present a pipeline leveraging video diffusion models and data augmentation to generate synthetic training data from a single image, enabling animatable 3D Gaussian avatars with consistent identity across poses.
Abstract: Creating animatable 3D human avatars from a single image remains a significant challenge with applications in virtual reality and human-centered AI. Traditional 3D Gaussian Splatting (3DGS) methods produce high-quality avatars but require monocular video sequences or multi-view inputs, while video diffusion models can animate from static images but struggle with temporal coherence and identity preservation. We present SVAD, a novel framework for synthetic data generation and avatar creation that addresses these limitations. SVAD leverages video diffusion models to generate an initial set of synthetic pose-conditioned animations from a single image, then enhances this synthetic data through identity preservation and image restoration modules. This high-quality synthetic dataset enables training of 3DGS avatar models that maintain subject fidelity and fine details across diverse poses and viewpoints. Our approach combines the generative capabilities of diffusion models with the rendering efficiency of 3DGS, resulting in state-of-the-art performance in single-image avatar creation. Experiments demonstrate that SVAD's synthetic data generation pipeline significantly improves temporal stability and identity consistency compared to existing methods, while enabling real-time rendering for interactive applications.
Supplementary Material: zip
Submission Number: 16
Loading