Abstract: In recent years, immersive communication has emerged as a compelling alternative to traditional video communication methods. One prospective avenue for immersive communication involves augmenting the user's immersive experience through the transmission of three-dimensional (3D) talking heads (THs). However, transmitting 3D THs poses significant challenges due to its complex and voluminous nature, often leading to pronounced distortion and a compromised user experience. Addressing this challenge, we introduce the 3D Talking Heads Quality Assessment (THQA-3D) dataset, comprising 1,000 sets of distorted and 50 original TH mesh sequences (MSs), to facilitate quality assessment in 3D TH transmission. A subjective experiment, characterized by a novel interactive approach, is conducted with recruited participants to assess the quality of MSs in THQA-3D dataset. Leveraging this dataset, we also propose a multimodal Quality-of-Experience (QoE) method incorporating a Large Quality Model (LQM). This method involves frontal projection of MSs and subsequent rendering into videos, with quality assessment facilitated by the LQM and a variable-length video memory filter (VVMF). Additionally, tone-lip coherence and silence detection techniques are employed to characterize audio-visual coherence in 3D MS streams. Experimental evaluation demonstrates the proposed method's superiority, achieving state-of-the-art performance on the THQA-3D dataset and competitiveness on other QoE datasets. Both the THQA-3D dataset and the QoE model have been publicly released at https://github.com/zyj-2000/THQA-3D.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Relevance To Conference: This work contributes to multimedia/multimodal processing through its focus on both subjective and objective Quality-of-Experience (QoE) assessment for 3D talking heads (THs). Specifically, the study selects 10 head models and corresponding mesh sequences (MSs) as original materials. Afterward, 7 common distortions encountered during communication transmissions are simulated, resulting in the construction of an open-source 3D Talking Head Quality Assessment (THQA-3D) dataset which consists of 1,000 distorted MSs. Furthermore, the research devises an interactive subjective quality assessment method tailored for 3D streaming media. This method facilitates the comprehensive presentation of both audio and MSs, empowering subjects to freely observe the THs from various viewpoints. Lastly, a multimodal fusion QoE approach assisted by a large quality model (LQM) is proposed. This method utilizes 2D frontal projections of MSs and audio, and extracts pertinent features from three perspectives, including video quality, synchronization and continuity. Finally, the proposed method incorporates psychological insights to enable effective quality assessment of 3D THs and classical QoE tasks.
Submission Number: 1985
Loading