Abstract: Modern Irish is a minority language lacking sufficient linguistic resources for the task of accurate automatic syntactic parsing of user-generated content. As with other languages, the linguistic style observed in Irish tweets differs, in terms of orthography, lexicon and syntax, to that of standard texts more commonly used in Natural Language Processing (NLP) for the development of language models and parsers.This paper reports on the development of TwittIrish, the first Irish Universal Dependencies Twitter Treebank. We describe our bootstrapping method, and report on preliminary parsing experiments.
0 Replies
Loading