TwittIrish: A Universal Dependencies Treebank of Tweets in Modern IrishDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Modern Irish is a minority language lacking sufficient linguistic resources for the task of accurate automatic syntactic parsing of user-generated content. As with other languages, the linguistic style observed in Irish tweets differs, in terms of orthography, lexicon and syntax, to that of standard texts more commonly used in Natural Language Processing (NLP) for the development of language models and parsers.This paper reports on the development of TwittIrish, the first Irish Universal Dependencies Twitter Treebank. We describe our bootstrapping method, and report on preliminary parsing experiments.
0 Replies

Loading