Latent representations for facial images and video editing. (Représentations latentes pour l'édition d'images et de vidéos de visages)

Xu Yao

2022 (modified: 02 Nov 2022)undefined 2022Readers: Everyone

Abstract: Learning to edit facial images and videos is one of the most popular tasks in both academia and industrial research. This thesis addresses the problem of face editing for the special case of high-resolution images and videos.In this thesis, we develop deep learning-based methods to perform facial image editing. Specifically, we explore the task using the latent representations obtained from two types of deep neural networks: autoencoder-based models and generative adversarial networks. For each type of method, we consider a specific image editing problem and propose an effective solution that outperforms the state-of-the-art.The thesis contains two parts. In part I, we explore image editing tasks via the latent space of autoencoders. We first consider the style transfer task between photos and propose an effective algorithm that is built on a pair of autoencoder-based networks. Second, we study the face age editing task for high-resolution images, using an encoder-decoder architecture. The proposed network encodes a face image to age-invariant feature representations and learns a modulation vector corresponding to a target age. Our approach allows for fine-grained age editing on high-resolution images in a single unified model.In part II, we explore the editing task via the latent space of generative adversarial models (GANs). First, we consider the problem of facial attribute disentangled editing on synthetic and real images, by proposing a latent transformation network that acts in the latent space of a pre-trained GAN model. We also proposed a video manipulation pipeline, to generalize the editing result to videos. Second, we investigate the problem of GAN inversion -- the projection of a real image to the latent space of a pretrained GAN. In particular, we propose a feed-forward encoder, which encodes a given image to a feature code and a latent code in one pass. The proposed encoder is shown to be more accurate and stable for image and video inversion, meanwhile, maintaining good editing capacities.

0 Replies