FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework

Alloy Das, Sanket Biswas, Prasun Roy, Subhankar Ghosh, Umapada Pal, Michael Blumenstein, Josep Lladós, Saumik Bhattacharya

Published: 2025, Last Modified: 26 Feb 2026WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Scene Text Editing (STE) is a challenging research prob-lem, that primarily aims towards modifying existing texts in an image while preserving the background and the font style of the original text. Despite its utility in numerous real-world applications, existing style-transfer-based approaches have shown sub-par editing performance due to (1) complex image backgrounds, (2) diverse font attributes, and (3) varying word lengths within the text. To address such limitations, in this paper, we propose a novel font-agnostic scene text editing and rendering framework, named FASTER, for simultaneously generating text in arbitrary styles and locations while preserving a natural and realistic appearance and structure. A combined fusion of target mask generation and style transfer units, with a cascaded self-attention mech-anism has been proposed to focus on multi-level text region edits to handle varying word lengths. Extensive evaluation on a real-world database withfurther subjective human eval-uation study indicates the superiority of FASTER in both scene text editing and rendering tasks, in terms of model per-formance and efficiency. The code and pre-trained models have been released in our Gi thub repo.
Loading