Abstract: Diffusion models (DMs) have significantly advanced the development of real-world image super-resolution (Real-ISR), but the computational cost of multi-step diffusion models limits their application. One-step diffusion models generate high-quality images in a one sampling step, greatly reducing computational overhead and inference latency. However, most existing one-step diffusion methods are constrained by the performance of the teacher model, where poor teacher performance results in image artifacts. To address this limitation, we propose FluxSR, a novel one-step diffusion Real-ISR technique based on flow matching models. We use the state-of-the-art diffusion model FLUX.1-dev as both the teacher model and the base model. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss and introduce Attention Diversification Loss (ADL) as a regularization term to reduce token similarity in transformer, thereby eliminating high-frequency artifacts. Comprehensive experiments demonstrate that our method outperforms existing one-step diffusion-based Real-ISR methods. The code and model will be released at \url{https://github.com/JianzeLi-114/FluxSR}.
Lay Summary: Imagine you’ve taken a grainy, low-resolution photo with your phone and wish you could see every detail as if it were captured by a high-end camera. Recent AI systems called *diffusion models* can do this by gradually “painting in” missing details, but they normally need to repeat this process many times—like an artist layering dozens of brushstrokes—so they run slowly and demand a lot of computing power. Our work, **FluxSR**, shows how to achieve the same sharp results in just one swift step. We start with today’s best multi-step model and teach a lighter “student” version to jump straight to the finished picture without those extra strokes. To keep faces, textures and edges looking natural—without the strange speckles that sometimes appear—we add two new training tricks: one that judges realism more like a human viewer would, and another that nudges the AI to pay attention to varied visual cues instead of repeating itself. The result is a faster, more efficient tool that upgrades blurry images to crisp, photo-like quality in a fraction of a second, opening the door to smooth super-resolution on everyday devices and in real-time applications such as video calls or mobile photography.
Link To Code: https://github.com/JianzeLi-114/FluxSR
Primary Area: Applications->Computer Vision
Keywords: One step Diffusion, FLUX.1-dev, flow matching models, Image Super Resolution
Submission Number: 2835
Loading