Understanding the Generalization of Blind Image Quality Assessment: A Theoretical Perspective on Multi-level Quality Features

23 Sept 2024 (modified: 03 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning;Computer Vision;Image Quality Assessment, Generalization, Theoretical Guarantees.
TL;DR: This paper proposes theoretical generalization bounds for IQA methods, revealing the trade-off between robust generalization and representation power, and provides the theoretical guarantees for generalization research in the BIQA field.
Abstract: Due to the high annotation costs and relatively small scale of existing Image Quality Assessment (IQA) datasets, attaining consistent generalization remains a significant challenge for prevalent deep learning (DL)-based IQA methods. Although it is widely believed that quality perception information primarily resides in low-level image features, and that effective representation learning for the multi-level image features and distortion information is deemed crucial for the generalization of the Blind IQA (BIQA) methods, the theoretical underpinnings for this belief still remain elusive. Therefore, in this work, we investigate the role of multi-level image features in the generalization and quality perception ability of the CNN-based BIQA models from a theoretical perspective. For the role of low-level features, in Theorem 1, we innovatively derive an upper bound of Rademacher Average and the corresponding generalization bound for the CNN-based BIQA framework under distribution invariance in training and test sets, which indicates that the generalization ability tends to be reduced as the level of quality features increases, demonstrating the value of low-level features. In addition, under distribution shifts, a much tighter generalization bound is proposed in Theorem 2, which elucidates the theoretical impact of distributional differences between training and test sets on generalization performance. For the role of high-level features, in Theorem 3, we prove that BIQA networks tend to possess higher Betti number complexity by learning higher-level quality features. This indicates a larger representation power with smaller empirical errors, highlighting the value of high-level features. The three proposed Theorems can provide theoretical support for the enhanced generalization in existing BIQA methods. Furthermore, these theoretical findings reveal an inherent tension between robust generalization and strong representation power in BIQA networks, which inspires us to explore effective strategies to reduce empirical error without compromising the generalization ability.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2750
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview