Handwritten Text Recognition and Browsing in Archive of Prisoners' Letters from Smolensk Convict Prison
Abstract: The task of creating a prototype navigation system for a small archive of historical documents (letters from prisoners of the Smo- lensk convict prison of the early 20th century) recorded in a single handwriting, is considered. To fit a model for handwritten text recognition, procedures were created for automatic preparation of image collections, including breaking into lines, pen trace segmentation, and deslanting of lines and pages. Experiments have shown that training a modern neural network on about a thousand line samples with the same handwriting allows achieving a decent recognition quality (5.11% CER and 17.55% WER). Further, the automatically recognized text was used for the task of searching by keywords. The text was corrected by dictionaries and prescribed rules, taking into account the peculiarities of Russian pre-reform spelling, recognition errors and the scriptor’s own errors. The search engine reached a precision of 97.14% and a recall of 91.35%. Visualization of the results provided highlighting of the found words on the original images. The study conducted demonstrates the possibility of creating a navigation system and its fitting to a specific handwriting with a small number of marked samples and limited human participation.
Loading