Abstract: Lip synchronization and talking face generation have gained a
specific interest from the research community with the advent
and need of digital communication in different fields. Prior works
propose several elegant solutions to this problem. However, they
often fail to create realistic-looking videos that account for people’s
expressions and emotions. To mitigate this, we build a talking face
generation framework conditioned on a categorical emotion to
generate videos with appropriate expressions, making them more
real-looking and convincing. With a broad range of six emotions
i.e., anger, disgust, fear, happiness, neutral, and sad, we show that
our model generalizes across identities, emotions, and languages.
0 Replies
Loading