Abstract: Scene text removal aims to remove scene text from images and fill the resulting gaps with plausible and realistic content. Within the context of scene text removal, two potential sub-tasks exist, i.e., text perception and text removal. However, most existing methods have ignored this premise or only divided this task into two consecutive stages, without considering the interactive promotion relationship between them. By leveraging some transformations, better segmentation results can better guide the process of text removal, and vice versa. These two sub-tasks can mutually promote and co-evolve, creating an intertwined and spiraling process similar to the double helix structure of Deoxyribonucleic acid (DNA) molecules. In this paper, we propose a novel network, HelixNet, incorporating Dual Helix Cooperative Decoders for Scene Text Removal. It is an end-to-end one-stage model with one shared encoder and two interacted decoders for the text segmentation and text removal sub-tasks. Through the use of dual branch information interaction, we can fuse complementary information from each sub-task, achieving interaction between scene text removal and segmentation. Our proposed method is extensively evaluated on publicly available and commonly used real and synthetic datasets. The experimental results demonstrate the promotion effect of the specially designed decoder and also show that HelixNet can achieve state-of-the-art performance.
Loading