Learning Fine-Grained Image Representations for Mathematical Expression Recognition

Sidney Bender, Monica Haurilet, Alina Roitberg, Rainer Stiefelhagen

2019 (modified: 15 Nov 2022)GREC@ICDAR 2019Readers: Everyone

Abstract: Optical character recognition is a key step towards automatically converting printed documents into electronic form. In this work, we consider the specialized task of mathematical expression recognition, which is characterized by a highly structured format and strict syntax rules, where the slightest mistake can lead to a very different meaning of the formula. To tackle this problem, we present a neural architecture based on a convolutional neural network focused specifically on fine-grained structures in the image. The obtained visual representations are used as an input to an encoder and an attention-based decoder module, trained jointly in an end-to-end manner. Given an input image, our model generates the underlying LaTeX markup that is able to perfectly describe the target mathematical formula. We conduct a thorough analysis of our model by examining the performance for different formula lengths and visualizing the attention maps of prediction examples. We demonstrate the effectiveness of our approach on the large-scale IM2LATEX-100K benchmark for mathematical expression recognition, where our model is able to outperform state-of-the-art methods, surpassing them by over 4% in image absolute accuracy.

0 Replies