Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection

Christian Reisswig; Anoop R Katti; Marco Spinaci; Johannes Höhne

Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection

Christian Reisswig, Anoop R Katti, Marco Spinaci, Johannes Höhne

Published: 01 Nov 2019, Last Modified: 06 Jul 2025DI 2019Readers: Everyone

Keywords: OCR, Computer Vision, Tesseract, Printed documents, Document Intelligence

TL;DR: End-to-end trainable Optical Character Recognition on printed documents; we achieve state-of-the-art results, beating Tesseract4 on benchmark datasets both in terms of accuracy and runtime, using a purely computer vision based approach.

Abstract: We present an end-to-end trainable approach for optical character recognition (OCR) on printed documents. It is based on predicting a two-dimensional character grid ('chargrid') representation of a document image as a semantic segmentation task. To identify individual character instances from the chargrid, we regard characters as objects and use object detection techniques from computer vision. We demonstrate experimentally that our method outperforms previous state-of-the-art approaches in accuracy while being easily parallelizable on GPU (thereby being significantly faster), as well as easier to train.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/chargrid-ocr-end-to-end-trainable-optical/code)

1 Reply

Loading