Abstract: This paper presents a novel deep architecture DFT-Net that combines the advantages of Generative Adversarial Networks (GANs) and warp mechanisms for expression editing. Recent generative models leverage Action Units as annotations and show more flexible expression manipulation than previous approaches using other guiding information. However, those methods bring inevitable artifacts where facial components deform (e.g. eyes from open to close), for the structural defect in modeling shape variations without geometric guidance such as facial landmarks. Our approach explicitly disentangles face deformations and appearance details by constructing two parallel networks, one that learns an appearance flow for 2D warps and the other generates corresponding texture and hallucinates hidden regions such as mouth interiors. Experimental results show our method outperforms the state-of-the-art on various expression editing tasks.
Loading