What Governs the Quality-Aware Generalization and Representation Capacity in No-Reference Image Quality Assessment Models?
Keywords: Blind Image Quality Assessment, Generalization Bound, Representation Capacity, Low-level Features, Distribution Shifts
TL;DR: We derive theoretical bounds for generalization and representation capacity in deep learning-based blind image quality assessment, revealing a trade-off between generalization and representational capacity.
Abstract: Due to the high annotation costs and relatively small size of existing Image Quality Assessment (IQA) datasets, attaining both consistent generalization and quality representation capacity remains a significant challenge for prevalent deep learning (DL)-based Blind IQA (BIQA) methods. Although effective representation learning for distortion is deemed crucial for the generalization of BIQA method, the theoretical underpinnings for this belief remain elusive. Therefore, in this study, we innovatively explore the theoretical quality-aware generalization bounds and representation capacity of DL-based IQA models, as well as the relationship between their respective determinants. For the generalization bound, under the assumption that training and test distributions are identical, we derive the fine-grained and coarse-grained upper bounds for BIQA generalization errors using covering number and VC dimension, respectively. These two theoretical results are presented in Theorem 1 and Theorem 2, revealing the role of low-level features in generalization. Under distribution shifts, we propose a tighter generalization bound to investigate the impact of distributional differences between training and test sets on BIQA generalization in Theorem 3 using Intrinsic Dimension, which can further confirm the generalization role of low-level features. For quality representation capacity, in Theorem 4, we quantify the representation capacity for BIQA models based on PAC-Bayes. The theoretical result demonstrates that learning higher-level quality features can enhance the quality representation capacity. These theorems offer theoretical support for enhanced performances in existing BIQA methods. Interestingly, our theoretical findings reveal an inherent tension between robust generalization and strong representation capacity in BIQA, which motivates effective strategies to lower empirical errors without undermining generalization. Extensive experiments confirm the reliability and practical value of our theorems.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6743
Loading