Text-Aware Image Restoration with Diffusion Models

Published: 26 Jan 2026, Last Modified: 02 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Model, Image Restoration, Text-spotting, Scene-Text Image Super Resolution
TL;DR: We introduce TAIR, a new task for restoring images with text contents by combining diffusion-based restoration and text spotting. We also propose a dataset, SA-Text, enabling joint optimization of visual and textual fidelity.
Abstract: While diffusion models have achieved remarkable success in natural image restoration, they often fail to faithfully recover textual regions, frequently producing plausible yet incorrect text-like patterns, a phenomenon we term text-image hallucination. To address this limitation, we propose Text-Aware Image Restoration (TAIR), a task requiring simultaneous recovery of visual content and textual fidelity. For this purpose, we introduce SA-Text, a large-scale benchmark of 100K high-quality scene images with dense annotations of diverse and complex text instances. We further present a multi-task diffusion framework, TeReDiff, which leverages internal features of diffusion models to jointly train a text-spotting module with the restoration module. This design allows intermediate text predictions from the text-spotting module to condition the diffusion-based restoration process during denoising, thereby enhancing text recovery. Extensive experiments demonstrate that our approach faithfully restores textual regions, outperforms existing diffusion-based methods, and achieves new state-of-the-art results on TextZoom, an STISR benchmark considered a subtask of TAIR. The code, weights, and dataset will be publicly released.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2889
Loading