Keywords: Super-Resolution, Perceptual Loss, Invertible Neural Network, Modulated PatchNCE, Homeomorphism
TL;DR: Multi-Granular High-Frequency Perceptual Loss for Image Super-Resolution
Abstract: An avalanche of innovations in perceptual loss has advanced the super-resolution (SR) literature, enabling the synthesis of realistic and detailed high-resolution images. However, most of these approaches rely on convolutional neural network (CNN)-based non-homeomorphic transforms, which result in information loss during guidance and often necessitate complex architectures and training procedures. To address these limitations—particularly the information loss and unwanted harmonics introduced by CNNs—we propose a diffeomorphic transform–based variant of a computationally efficient invertible neural network (INN) for a naive Multi-Granular High-Frequency (MGHF-n) perceptual loss, trained on ImageNet. Building on this foundation, we extend the framework into a comprehensive variant (MGHF-c) that integrates multiple constraints to preserve, prioritize, and regularize information across several aspects: texture and style preservation, content fidelity, regional detail preservation, and joint content–style regularization. Information is prioritized through adaptive entropy-based pruning and reweighting of INN features, while a content–style consistency regularizer regulates excessive texture generation and ensures content fidelity. To capture intricate local details, we further introduce modulated PatchNCE on INN features as a local information preservation (LIP) objective. As another thread in the tapestry, we present the theoretical foundation, showing that (1) the LIP objective compels the SR network to maximize the mutual information between super-resolved and ground-truth modalities, and (2) a diffeomorphic transform–based perceptual loss enables more effective learning of the ground-truth distribution manifold compared to non-homeomorphic counterparts. Empirical results demonstrate that the proposed MGHF objective substantially improves both GAN- and diffusion-based SR algorithms across multiple evaluation metrics, and the code will be released publicly after the review process.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9646
Loading