- Decision: submitted, no decision
- Abstract: Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localization, segmentation, and recognition steps. In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural-network that operates directly off of the image pixels. This model is configured with 11 hidden layers all with feedforward connections. We employ the DistBelief implementation of deep neural networks to scale our computations over this network. We have evaluated this approach on the publicly available SVHN dataset and achieve over 96% accuracy in recognizing street numbers. We show that on a per-digit recognition task, we improve upon the state-of-the-art and achieve 97.84% accuracy. We also evaluated this approach on an even more challenging dataset generated from Street View imagery containing several 10s of millions of street number annotations and achieve over 90% accuracy. Our evaluations further indicate that at specific operating thresholds, the performance of the proposed system is comparable to that of human operators and has to date helped us extract close to 100 million street numbers from Street View imagery worldwide.