OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data

Nilo Pedrazzini, Hanne M. Eckhoff

Published: 2021, Last Modified: 07 Jun 2024Softw. Impacts 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: h2>Abstract</h2><p>Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining.</p>