Abstract: We present HeadArtist for 3D head generation following human-language descriptions. With a landmark-guided ControlNet serving as a generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into a frozen ControlNet twice for noise prediction. We conduct two predictions via the same ControlNet structure but with different classifier-free guidance (CFG) weights. The difference between these two predicted results directs how the rendered image can better match the text of interest. Experimental results show that our approach produces high-quality 3D head sculptures with rich geometry and photo-realistic appearance, which significantly outperforms state-of-the-art methods. We also show that our pipeline supports editing operations on the generated heads, including both geometry deformation and appearance change. Project page:https://kumapowerliu.github.io/HeadArtist.
Loading