Handwritten Text Recognition Adaptation for Low-Resource Languages: A Case Study on Historical Latin Manuscripts
Keywords: Handwritten Text Recognition; Transfer Learning
TL;DR: Our paper presents a method for the digitization of old Latin manuscripts by leveraging transfer learning from modern English-based HTR models to historical Latin HTR models.
Abstract: Handwritten Text Recognition (HTR) remains a challenging task in document digitization, particularly for historical manuscripts written in low-resource languages such as Latin. In this paper, we focus on recognizing Latin texts from 16th–18th century manuscripts, which exhibit a wide range of handwriting styles. To address this, we propose AdapterTrOCR, a modular extension of the TrOCR model that incorporates two adapter modules: one for historical language adaptation and another for handwriting style adaptation. This architecture enables a robust transition from a generic English HTR model to one specialized in historical Latin. Given the limited availability of annotated data, we also explore Handwritten Text Generation (HTG) as a data augmentation strategy. Our results show the effectiveness of modular adaptation and synthetic data in improving HTR performance, achieving reductions in character error rate (CER) by 13.33% to 35.65% and word error rate (WER) by 8.56% to 27.72%.
Supplementary Material: pdf
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 17459
Loading