SCRIBE: STROKE- AND CONTEXT-REGULARIZED TEST-TIME ADAPTATION FOR HANDWRITTEN TEXT RECOGNITION

08 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: HTR, TTA, OOD
Abstract: Handwritten text recognition (HTR) converts images of handwritten text—from lines to full pages—into accurate, machine-readable transcriptions. However, it often operates under distribution shift—new writers, historical substrates, scan- ning artifacts, layouts, and even cross-language use—precisely when target la- bels and source data are unavailable. Although recent foundation models per- form well on their training distributions, their generalization across domains is fragile. Limitations in capacity, inadequate pretraining scale, or corpus–domain mismatch frequently lead to pronounced errors, underscoring the need for effi- cient adaptation even with state-of-the-art pretrained models. We fill this gap by adapting a foundation model at test-time without labels or source data. To the best of our knowledge, this is the first HTR test-time adaptation approach that jointly optimizes a lightweight stroke-structure loss with a document-conditioned language prior, rather than treating linguistic (LM decoding/reranking) and vi- sual (self-training/normalization) cues separately. Evaluated on four benchmarks (George Washington, IAM, RIMES, Bentham), our approach achieves an aver- age absolute reduction of 0.0341 in CER and 0.0427 in WER, corresponding to mean relative improvements of 20.8% and 12.8%, respectively. These findings demonstrate that integrating lightweight visual and linguistic priors provides an effective strategy for test-time adaptation in HTR.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 3193
Loading