Keywords: Text Image generation;Text Image Super Resolution; Diffusion Models
Abstract: Text Image Super-Resolution (TISR) aims to recover high-resolution images from low-resolution inputs degraded by unknown factors. The goal is to produce visually faithful output while preserving text readability and semantic consistency. Despite recent progress, existing methods struggle to preserve structure and visual fidelity under complex glyphs, severe degradations, or varied layouts. This mainly stems from three challenges: lack of training data, limitations in model fidelity, and limited adaptability to complex layouts. Therefore, we novelly propose **ReGenText**, a systematic solution for diverse text super-resolution. ReGenText integrates data generation, image restoration, and training strategies, substantially mitigating the three aforementioned challenges. Specifically, we propose: **Gen-Text:** a diffusion-based data generation framework that combines font styles and glyph priors to synthesize large-scale, high-quality text images, effectively alleviating data scarcity; **Re-Text:** a hybrid diffusion–GAN model that balances structural precision and fine-detail restoration for high-fidelity reconstruction; **Bucket Training:** a training strategy that groups samples based on text length and orientation, improving generalization on long and vertical texts. Extensive experiments show that ReGenText achieves state-of-the-art performance in both text recognition and visual quality across multiple benchmarks.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1398
Loading