Abstract: Convolutional neural networks (CNNs) have achieved state of the art performance on recognizing and representing audio, images, videos and 3D volumes; that is, domains where the input can be characterized by a regular graph structure.
However, generalizing CNNs to irregular domains like 3D meshes is challenging. Additionally, training data for 3D meshes is often limited. In this work, we generalize convolutional autoencoders to mesh surfaces. We perform spectral decomposition of meshes and apply convolutions directly in frequency space. In addition, we use max pooling and introduce upsampling within the network to represent meshes in a low dimensional space. We construct a complex dataset of 20,466 high resolution meshes with extreme facial expressions and encode it using our Convolutional Mesh Autoencoder. Despite limited training data, our method outperforms state-of-the-art PCA models of faces with 50% lower error, while using 75% fewer parameters.
TL;DR: Convolutional autoencoders generalized to mesh surfaces for encoding and reconstructing extreme 3D facial expressions.
Keywords: meshes, convolutions, faces, autoencoder
Withdrawal: Confirmed
8 Replies
Loading