Reinforcement Shrink-Mask for Text Detection

Chuang Yang, Mulin Chen, Yuan Yuan, Qi Wang

Published: 01 Jan 2023, Last Modified: 13 Nov 2024IEEE Trans. Multim. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing real-time text detectors reconstruct text contours by shrink-masks only. Though they simplify the framework and can make the model run fast, the strong dependence on shrink-masks leads to unreliable detection results (e.g., miss detection and overdetection). Moreover, these methods ignore the information from surrounding pixels, which causes sensitive shrink-masks and accelerates the reliability decline of detection results. Considering the above problems, we construct an effective and efficient text detection network, termed as Reinforcement Shrink-Mask for Text Detection (RSMTD), which strengthens the model's ability to recognize texts while enjoying a high detection speed. Specifically, an effective text representation strategy (Reinforcement Shrink-Mask, RSM) is designed to decouple texts and shrink-masks. RSM builds texts through shrink-masks and reinforcement offsets to ensure stable detection results encountering shrink-masks that deviate from the ground-truth. It is worth noting that reinforcement offsets can force our method to focus on the foreground shapes to bring precise shrink-mask edges. For the robustness improvement of shrink-masks, Super-pixel Window (SPW) is proposed to encourage RSMTD to utilize the surroundings of each pixel to predict shrink-masks. Particularly, SPW treats the interval regions between texts and shrink-masks as background, which helps to suppress interval regions and to avoid text adhesion. Moreover, a lightweight feature merging branch is constructed to further accelerate the inference process. As demonstrated in the experiments, our method is superior to existing state-of-the-art (SOTA) methods in both detection accuracy and speed on multiple benchmarks.