Keywords: Image super resolution, Image restoration, Diffusion models, Perception-distortion trade-off
Abstract: In image super-resolution (SR), perceptual quality and distortion form two competing objectives, bounded by the Perception-Distortion trade-off. GAN-based SR models reduce distortion but often fail to synthesize realistic fine-grained textures, while diffusion-based models generate perceptually plausible details but frequently hallucinate content, leading to fidelity loss. This raises a key challenge: how to harness the powerful generative priors of diffusion models without sacrificing fidelity. We introduce SpaSemSR, a Spatial-Semantic guided diffusion-based framework that addresses this challenge through two complementary guidance. First, spatial-grounded textual guidance integrates object-level spatial cues with semantic prompts, reducing distortion by aligning textual guidance with visual structure. Second, semantic-enhanced visual guidance unifies multimodal semantic priors via a multi-encoder design with semantic degradation constraints, improving perceptual realism under severe degradations. These complementary guidances are adaptively fused with diffusion priors via novel spatial-semantic attention mechanisms, curbing distortion and hallucination while preserving the strengths of generative diffusion models. Extensive experiments across multiple benchmarks demonstrate that SpaSemSR achieves a state-of-the-art balance between perception and distortion, producing both realistic and faithful restorations.
Primary Area: generative models
Submission Number: 2842
Loading