Abstract: 3D Gaussian splatting (3DGS) has demonstrated significant potential in audio-driven talking head synthesis. However, despite notable advancements in speed and fidelity, current methods still face challenges such as inaccurate lip movements and facial artifacts. To address these issues, we propose LMTalker, a sparse landmark-guided 3DGS method, applying facial landmarks for the first time in 3DGS-based talking head synthesis. Our method explicitly leverages sparse facial landmarks to guide the deformation of dense Gaussians, effectively reduces inconsistencies between the input audio and facial dynamics, leading to improved lip movement accuracy and facial fidelity. Furthermore, we utilize facial landmarks in a hierarchical way to achieve region-specific generation. By integrating audio information, we enhance the clarity and reduce artifacts in the inner mouth region. Experimental results demonstrate that our method surpasses existing methods in terms of fidelity and lip movements accuracy, while maintaining high rendering speed. Our project page is available at https://jiangzhiwen0520.github.io/LMTalker
Loading