Semantic-Aware and Quality-Aware Interaction Network for Blind Video Quality Assessment

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Current state-of-the-art video quality assessment (VQA) models typically integrate various perceptual features to comprehensively represent video quality degradation. These models either directly concatenate features or fuse different perceptual scores while ignoring the domain gaps between cross-aware features, thus failing to adequately learn the correlations and interactions between different perceptual features. To this end, we analyze the independent effects and information gaps of quality- and semantic-aware features on video quality. Based on an analysis of the spatial and temporal differences between two aware features, we proposed a semantic-**A**ware and quality-**A**ware **I**nteraction **Net**work (**A$^2$INet**) for blind VQA (BVQA). For spatial gaps, we introduce a cross-aware guided interaction module to enhance the interaction between semantic- and quality-aware features in a local-to-global manner. Considering temporal discrepancies, we design a cross-aware temporal modeling module to further perceive temporal content variation and quality saliency information, and perceptual features are regressed into quality score by a temporal network and a temporal pooling. Extensive experiments on six benchmark VQA datasets show that our model achieves state-of-the-art performance, and ablation studies further validate the effectiveness of each module. We also present a simple video sampling strategy to balance the effectiveness and efficiency of the model. The code for the proposed method will be released.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Video quality assessment (VQA) is to enable the model to perceive the visual quality of videos and produce results consistent with human subjective opinions, making it a popular research topic in multimedia. Our work contributes significantly to the field of multimedia processing, aligning closely with the themes and objectives of the conference, "Experience". Specifically, our work improves the model's performance in predicting video quality by designing a quality-aware and quality-aware interaction network that enhances the spatial and temporal interactions between the two aware features. To the best of our knowledge, this is the first attempt to explore the gaps between various perceptual features and utilize cross-aware feature learning to enhance the perceptual capabilities of the blind VQA model. Extensive experiments validate that the proposed method outperforms the state-of-the-art VQA model on all six benchmark VQA datasets.
Supplementary Material: zip
Submission Number: 1181
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview