Abstract: Recently, numerous deep learning based scene text detection methods have achieved promising performances in different text detecting tasks. Most of these methods are trained in a supervised way, which requires a large amount of annotated data. In this paper, we explore a weakly supervised method to locate text regions in scene images. We propose a fully convolutional network (FCN) architecture to implement binary classification. The training data we used do not need any text location annotation, we only need to divide the training data into two categories according to whether it contains text or not. We can obtain the text localization map (TLM) directly from the last convolutional layer. By setting a fixed threshold, the TLM is converted to a mask map. Then the connected component analysis and the text proposals method based on Maximally Stable Extremal Regions (MSERs) are used to get the text region bounding boxes. We conduct comprehensive experiments on standard text datasets. The results show that our text localization method achieves comparable recall performance with other methods and has more stable property.
0 Replies
Loading