ReGenText: Joint Generation and Restoration for Diverse Text Image Super-Resolution

Bolin Wang; Linlong Fan; Wang Qiushi; Yuhang Yu; Yiyan Luo; Jinwei Chen; Kejun Zhang; Qingnan Fan

ReGenText: Joint Generation and Restoration for Diverse Text Image Super-Resolution

Bolin Wang, Linlong Fan, Wang Qiushi, Yuhang Yu, Yiyan Luo, Jinwei Chen, Kejun Zhang, Qingnan Fan

03 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text Image generation;Text Image Super Resolution; Diffusion Models

Abstract: Text Image Super-Resolution (TISR) aims to recover high-resolution images from low-resolution inputs degraded by unknown factors. The goal is to produce visually faithful output while preserving text readability and semantic consistency. Despite recent progress, existing methods struggle to preserve structure and visual fidelity under complex glyphs, severe degradations, or varied layouts. This mainly stems from three challenges: lack of training data, limitations in model fidelity, and limited adaptability to complex layouts. Therefore, we novelly propose **ReGenText**, a systematic solution for diverse text super-resolution. ReGenText integrates data generation, image restoration, and training strategies, substantially mitigating the three aforementioned challenges. Specifically, we propose: **Gen-Text:** a diffusion-based data generation framework that combines font styles and glyph priors to synthesize large-scale, high-quality text images, effectively alleviating data scarcity; **Re-Text:** a hybrid diffusion–GAN model that balances structural precision and fine-detail restoration for high-fidelity reconstruction; **Bucket Training:** a training strategy that groups samples based on text length and orientation, improving generalization on long and vertical texts. Extensive experiments show that ReGenText achieves state-of-the-art performance in both text recognition and visual quality across multiple benchmarks.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 1398

Loading