Abstract: This paper describes the development of
the first syntactically-annotated corpus of
Welsh within the Universal Dependencies
(UD) project. We explain how the corpus
was prepared, and some Welsh-specific
constructions that require attention. The
treebank currently contains 10 756 tokens.
An 10-fold cross evaluation shows that
results of both, tagging and dependency
parsing, are similar to other treebanks of
comparable size, notably the other Celtic
language treebanks within the UD project.
Loading