Abstract: The convergence of virtual reality live streaming and AI-driven avatars has emerged as a significant technological trend. However, current integration attempts remain in the proof-of-concept stage, with the primary challenge of automatic interaction system establishment. To build interactive intelligence anime avatars within VR frameworks, we have developed a multimodal interaction architecture centered on dialogue agents, realizing comprehensive understanding, reasoning, and response. Our approach 1).proposes high granularity explicit-implicit understanding and a dual-center switchable reasoning mechanism to support flexible responses. 2).innovates a dual-source animation mechanism for co-speech face-body visualization and a textual command module for supervising crossmodal animation, and 3).enhances expressiveness through mapping persona, content, voice, and motion to anime style. Experimental results demonstrate the state-of-the-art performance of VRtalk, highlighting its practical significance and future potential.
External IDs:doi:10.1109/ismar67309.2025.00125
Loading