FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation

Xinya Ji; Sebastian Weiss; Manuel Kansy; Jacek Naruniec; Xun Cao; Barbara Solenthaler; Derek Bradley

FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation

Xinya Ji, Sebastian Weiss, Manuel Kansy, Jacek Naruniec, Xun Cao, Barbara Solenthaler, Derek Bradley

Published: 26 Jan 2026, Last Modified: 17 May 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Animation, Gaussian Avatar, Feedforward Gaussian Model

TL;DR: A new method for creating high quality 3D Gaussian Head Avatars from a few input images, which allows real-time dynamic animation.

Abstract: Despite recent progress in 3D Gaussian-based head avatar modeling, efficiently generating high fidelity avatars remains a challenge. Current methods typically rely on extensive multi-view capture setups or monocular videos with per-identity optimization during inference, limiting their scalability and ease of use on unseen subjects. To overcome these efficiency drawbacks, we propose FastGHA, a feed-forward method to generate high-quality Gaussian head avatars from only a few input images while supporting real-time animation. Our approach directly learns a per-pixel Gaussian representation from the input images, and aggregates multi-view information using a transformer-based encoder that fuses image features from both DINOv3 and Stable Diffusion VAE. For real-time animation, we extend the explicit Gaussian representations with per-Gaussian features and introduce a lightweight MLP-based dynamic network to predict 3D Gaussian deformations from expression codes. Furthermore, to enhance geometric smoothness of the 3D head, we employ point maps from a pre-trained large reconstruction model as geometry supervision. Experiments show that our approach significantly outperforms existing methods in both rendering quality and inference efficiency, while supporting real-time dynamic avatar animation.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 484

Loading