Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement

Yidi Liu; Xueyang Fu; Jie Huang; Jie Xiao; Dong Li; Wenlong Zhang; LEI BAI; Zheng-Jun Zha

Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement

Yidi Liu, Xueyang Fu, Jie Huang, Jie Xiao, Dong Li, Wenlong Zhang, LEI BAI, Zheng-Jun Zha

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Ultra-high-definition, Latent Regularization, Image Restoration, Fidelity-Perception Trade-off

TL;DR: This paper presents the "Latent Harmony" framework, which achieves synergistic unified Ultra-High Definition (UHD) image restoration through latent space regularization and controllable refinement.

Abstract: Ultra-High Definition (UHD) image restoration struggles to balance computational efficiency and detail retention. While Variational Autoencoders (VAEs) offer improved efficiency by operating in the latent space, with the Gaussian variational constraint, this compression preserves semantics but sacrifices critical high-frequency attributes specific to degradation and thus compromises reconstruction fidelity. % This compromises reconstruction fidelity, even when global semantics are preserved. Consequently, a VAE redesign is imperative to foster a robust semantic representation conducive to generalization and perceptual quality, while simultaneously enabling effective high-frequency information processing crucial for reconstruction fidelity. To address this, we propose \textit{Latent Harmony}, a two-stage framework that reinvigorates VAEs for UHD restoration by concurrently regularizing the latent space and enforcing high-frequency-aware reconstruction constraints. Specifically, Stage One introduces the LH-VAE, which fortifies its latent representation through visual semantic constraints and progressive degradation perturbation for enhanced semantics robustness; meanwhile, it incorporates latent equivariance to bolster its high-frequency reconstruction capabilities. Then, Stage Two facilitates joint training of this refined VAE with a dedicated restoration model. This stage integrates High-Frequency Low-Rank Adaptation (HF-LoRA), featuring two distinct modules: an encoder LoRA, guided by a fidelity-oriented high-frequency alignment loss, tailored for the precise extraction of authentic details from degradation-sensitive high-frequency components; and a decoder LoRA, driven by a perception-oriented loss, designed to synthesize perceptually superior textures. These LoRA modules are meticulously trained via alternating optimization with selective gradient propagation to preserve the integrity of the pre-trained latent structure. This methodology culminates in a flexible fidelity-perception trade-off at inference, managed by an adjustable parameter $\alpha$. Extensive experiments demonstrate that \textit{Latent Harmony} effectively balances perceptual and reconstructive objectives with efficiency, achieving superior restoration performance across diverse UHD and standard-resolution scenarios.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 838

Loading