Abstract: Real-time and zero-shot attribute separation of a given real-face image, allowing attribute transfer and rendering at novel views without the aid of multi-view information, has been demonstrated to be beneficial in real-world scenarios. In this work, we propose an alternating optimization framework and train it on attribute-blending (i.e., unstructured) monocular images. Our framework leverages a pre-trained facial attribute encoder and a 3D-representation face synthesis decoder (e.g., HeadNeRF) to reinforce and guide each other mutually. This allows the facial attribute encoder to better express and separate facial attributes and the face synthesis decoder to render faces with better image similarity and attribute consistency.
Loading