Abstract: The emergence of virtual reality technology has made stereoscopic omnidirectional images (SOI) easily accessible and prompting the need to evaluate their perceptual quality. At present, most stereoscopic omnidirectional image quality assessment (SOIQA) methods rely on one of the projection formats, i.e., Equirectangular Projection (ERP) or CubeMap Projection (CMP). However, while ERP provides global information and the less distorted CMP complements it by providing local structural guidance, research on leveraging both ERP and CMP in SOIQA remains limited, hindering a comprehensive understanding of both global and local visual cues. Motivated by this gap, our study introduces a novel dual-stream perception-driven network for blind quality assessment of stereoscopic omnidirectional images. By integrating both ERP and CMP, our method effectively captures both global and local information, marking the first attempt to bridge this gap in SOIQA, particularly through deep learning methodologies. We employ an inter-intra feature fusion module, which considers both the inter-complementarity between ERP and CMP and the intra-relationships within CMP images. This module dynamically and complementarily adjusts the contributions of features from both projections and effectively integrates them to achieve a more comprehensive perception. Besides, deformable convolution is employed to extract the local region of interest, simulating the orientation selectivity of the primary visual cortex. Finally, with the features of left and right views of SOI, a stereo cross attention module that simulates the binocular fusion mechanism is proposed to predict the quality score. Extensive experiments are conducted to evaluate our model and the state-of-the-art competitors, demonstrating that our model has achieved the best performance on the databases of LIVE 3D VR, SOLID, and NBU.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This work significantly advances multimedia processing by addressing the challenge of assessing perceptual quality in stereoscopic omnidirectional images (SOI), a crucial aspect of virtual reality (VR) content. By integrating both Equirectangular Projection (ERP) and CubeMap Projection (CMP) formats, the proposed dual-stream perception-driven network effectively captures global and local information, leveraging convolutional neural networks (CNNs) and Transformers. The inclusion of an inter-intra feature fusion module enhances feature selection, while deformable convolution mimics the orientation selectivity of the primary visual cortex, improving local region extraction. Additionally, the stereo cross attention module simulates binocular fusion, essential for assessing stereoscopic image quality. This comprehensive approach, incorporating deep learning techniques and modeling visual perception mechanisms, not only enhances SOI quality assessment but also extends to broader multimedia contexts, offering insights into processing multimodal data with diverse spatial and perceptual characteristics. Overall, this research represents a significant step towards holistic understanding and evaluation of immersive multimedia experiences.
Submission Number: 4678
Loading