Maximizing Data Efficiency of HTR Models by Synthetic Text

Published: 2024, Last Modified: 10 Nov 2025DAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The usability of synthetic handwritten text to improve machine learning models is assessed for the domain of HTR. Synthetic handwritten text is generated using an existing model based on a GAN. The output of this model is then used to train a state-of-the-art HTR model, which is then applied to recognize real datasets. While this results in a CER of 28.3% and a WER of 65.5% for line images of the IAM dataset - more than three times higher than the state-of-the-art result - our experiments show that the amount of real data in a mixed training set can be significantly reduced (70–80%) to achieve comparable CER and WER rates as with real data. Using only 10% of the training data (113 images) from the CVL dataset results in a CER of 54.5% and a WER of 88.8%, pre-training the model with synthetic data results in a CER of 14.6% and a WER of 43.4%.
Loading