Abstract: In this paper, we introduce a novel approach to single-image super-resolution (SISR) that balances perceptual quality and distortion through multi-objective optimization (MOO). Traditional pixel-based distortion metrics like PSNR and SSIM often fail to align with human perceptual quality, resulting in blurry outputs despite high scores. To address this, we propose the Multi-Objective Bayesian Optimization Super-Resolution (MOBOSR) framework, which dynamically adjusts loss weights during training. This reduces the need for manual hyperparameter tuning and lessens computational demands compared to AutoML. Our method conceptualizes the relationship between loss weights and image quality assessment (IQA) metrics as black-box objective functions, optimized to achieve an optimal perception-distortion Pareto frontier. Extensive experiments demonstrate that MOBOSR surpasses current state-of-the-art methods in both perception and distortion, significantly advancing the perception-distortion Pareto frontier. Our work lays a foundation for future exploration of the balance between perceptual quality and fidelity in image restoration tasks. Source codes and pretrained models are available at: https://github.com/ZhuKeven/MOBOSR.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Generation] Generative Multimedia
Relevance To Conference: Images and videos constitute the most critical mediums in multimedia. Enhancing the quality of images and videos to improve the viewing experience is one of the focal areas of multimedia research and a key interest of the ACM Multimedia Conference. Super-resolution stands as one of the most widely applied techniques for augmenting the perceptual quality of multimedia content. It has been practically implemented across PCs, TVs, and mobile devices, significantly enhancing the experience of watching videos, viewing images, and even gaming. Yet, traditional pixel-based metrics like PSNR and SSIM, used for evaluating SR models, do not align well with human perceptual quality. This often results in outputs that appear blurry, despite high metric scores, due to the lack of high-frequency details. This discrepancy has prompted the exploration of full reference perceptual metrics such as LPIPS, which better reflect human visual perception. However, the inherent conflict between perception and distortion presents a significant challenge in achieving an optimal balance. Our research introduces multi-objective optimization (MOO) to SISR, aiming to balance perceptual with distortion. Our work is primarily dedicated to enhancing the perceptual impact of images, which we believe holds significant value for the multimedia community.
Supplementary Material: zip
Submission Number: 4436
Loading