MONSTERMASH: Multidirectional, Overlapping, Nested, Spiral Text Extraction for Recognition Models of Arabic-Script Handwriting

Published: 01 Jan 2024, Last Modified: 19 Feb 2025ICDAR (Workshops 2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Most current models for handwritten text recognition transcribe individual lines and thus depend on accurate line extraction from page images. This line extraction task is particularly challenging for Arabic-script manuscripts, which exhibit a high proportion of curved lines, word baselines that vary within the line, and varying line orientation on the page. We present a new corpus for studying Arabic-script line extraction in the presence of these phenomena and evaluate different model architectures using several pixel-level, object-level, and extrinsic recognition metrics. Training all models on the same data, we find that the CNN-based Kraken model slightly outperforms the transformer-based TESTR model on recognition character accuracy and some object-level metrics, even though it lags behind on pixel-level metrics.
Loading