Abstract: Highlights • A novel bottom-up method for detecting dense and arbitrary-shaped scene texts. • Explicitly learning repulsive links between close texts helps to separate dense text instances. • Instance-aware loss for bottom-up deep learning-based methods, further boosting the performance. • A dataset consisting of dense and arbitrary-shaped scene text of commodity images is introduced. • Significantly improved performance on dense and curved text detection. Abstract State-of-the-art methods have achieved impressive performances on multi-oriented text detection. Yet, they usually have difficulty in handling curved and dense texts, which are common in commodity images. In this paper, we propose a network for detecting dense and arbitrary-shaped scene text by instance-aware component grouping (ICG), which is a flexible bottom-up method. To address the difficulty in separating dense text instances faced by most bottom-up methods, we propose attractive and repulsive link between text components which forces the network learning to focus more on close text instances, and instance-aware loss that fully exploits context to supervise the network. The final text detection is achieved by a modified minimum spanning tree (MST) algorithm based on the learned attractive and repulsive links. To demonstrate the effectiveness of the proposed method, we introduce a dense and arbitrary-shaped scene text dataset composed of commodity images (DAST1500). Experimental results show that the proposed ICG significantly outperforms state-of-the-art methods on DAST1500 and two curved text datasets: Total-Text and CTW1500, and also achieves very competitive performance on two multi-oriented datasets: ICDAR15 (at 7.1FPS for 1280 × 768 image) and MTWI.
0 Replies
Loading