Abstract: Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Generation] Multimedia Foundation Models
Relevance To Conference: This work contributes to multimedia/multimodal processing by integrating large multi-modality models (LMMs) into Point Cloud Quality Assessment (PCQA), a novel application in 3D visual quality analysis. Traditionally, LMMs excel in low-level vision and quality assessment tasks across various media types, but their application in PCQA has been unexplored. By transforming quality labels into textual descriptions and fine-tuning LMMs to interpret these along with 2D projections of point clouds, this study bridges a critical gap between 2D and 3D quality assessment. The inclusion of structural features compensates for the 3D perception loss, ensuring comprehensive quality evaluation. This integration not only enhances the robustness and accuracy of quality assessments in multimedia processing but also paves the way for future research in combining LMMs with 3D data analysis, expanding the scope of multimodal processing in both theory and application.
Submission Number: 981
Loading