Abstract: In this study, our goal is to create interactive avatar agents that can autonomously animate nuanced facial movements realistically, from both visual and behavioral perspectives. Given high-level inputs about the environment and agent profile, our framework harnesses LLMs to produce a series of detailed text descriptions of the avatar agents’ facial motions. These descriptions are then processed by our task-agnostic driving engine into motion token sequences, which are subsequently converted into continuous motion embeddings that are further consumed by our standalone neural-based renderer to generate the final photorealistic avatar animations. To our knowledge, we are the first to utilize the planning and reasoning ability of LLMs together with neural rendering for generalized non-verbal prediction and photo-realistic rendering of avatar agents.
External IDs:dblp:conf/eccv/WangDDW24
Loading