Morphosyntactic Parser for Old Czech

Published: 27 May 2026, Last Modified: 27 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0
Keywords: Old Czech, Middle Czech, diachronic data, syntactic analysis, parsing, Universal Dependencies, STARK, periphrastic verb forms
Working Group: WG1: Corpus annotation, WG3: Multilingual and cross-lingual language technology
Abstract: Old Czech is the earliest phase of the Czech language documented in written records, dating from the mid-12th to the 16th century. Although Modern Czech is highly represented in Universal Dependencies, Old Czech remains a low-resource language from a NLP perspective. Manual morphological annotation in the UD style is now being added to selected old texts at the Institute of the Czech Language (ÚJČ). No syntactic annotation was available so far. We present an adaptation of the UDPipe 2 parser to Old Czech with the help of a data sample that we annotated manually. By utilizing this adapted parser in conjunction with the STARK treebank analysis tool, we enable the extraction and direct comparison of syntactic patterns between Modern and Old Czech.
Tracks For Type Of Contribution: Work in progress
Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: No
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 50
Loading