FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: 4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length. Meanwhile, we propose a temporal coherence loss to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements. Furthermore, we designed a Multi-level Identity-Aware Displacement Network based on a cross-attention mechanism to reconstruct the 4D facial expression sequences from landmark sequences. Finally, our FC-4DFS achieves flexible and SOTA generation results of 4D facial expression sequences with different lengths on CoMA and Florence4D datasets. The code will be available on GitHub.
Primary Subject Area: [Generation] Generative Multimedia
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This work makes a significant contribution to generative multimedia and multimedia applications by developing a novel framework that helps generate 4D facial expression sequences flexibly. This advancement is particularly valuable in multimedia environments such as virtual reality, games, and animated films, where authenticity and expressiveness are critical. The integration of a frequency-controlled LSTM network and a multi-level identity-aware displacement network (MIADNet) provides a refined method for animating 3D facial meshes. The method is based on expression labels and allows flexible generation of scene-specific facial expression sequences with minimal prior information. The integration of temporal coherence loss and positional encoding enhances sequence smoothness and continuity, which is important for consistent storytelling and character animation in multimedia environments. This method provides adaptability to different scenarios for low-prior 4D facial expression sequence generation in multimedia applications.
Supplementary Material: zip
Submission Number: 4062
Loading