Post-OCR parsing: building simple and robust parser via BIO tagging

Wonseok Hwang; Seonghyeon Kim; Minjoon Seo; Jinyeong Yim; Seunghyun Park; Sungrae Park; Junyeop Lee; Bado Lee; Hwalsuk Lee

Post-OCR parsing: building simple and robust parser via BIO tagging

Wonseok Hwang, Seonghyeon Kim, Minjoon Seo, Jinyeong Yim, Seunghyun Park, Sungrae Park, Junyeop Lee, Bado Lee, Hwalsuk Lee

Published: 01 Nov 2019, Last Modified: 05 May 2023DI 2019Readers: Everyone

Keywords: Post-OCR, parsing, OCR, tagging, BIO

Abstract: Parsing textual information embedded in images is important for various down- stream tasks. However, many previously developed parsers are limited to handling the information presented in one dimensional sequence format. Here, we present Post Ocr Tagging based parser (POT), a simple and robust parser that can parse visually embedded texts by BIO-tagging the output of optical character recognition (OCR) task. Our shallow parsing approach enables building robust neural parser with less than a thousand labeled data. POT is validated on receipt and namecard parsing tasks.

TL;DR: POST-OCR parsing

1 Reply

Loading