Constructing Multilingual CCG Treebanks from Universal DependenciesDownload PDF

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone
Abstract: This paper introduces an algorithm to convert Universal Dependencies (UD) treebanks to Combinatory Categorial Grammar (CCG) treebanks. As CCG encodes almost all grammatical information into the lexicon, obtaining a high quality CCG derivation from a dependency tree is a challenging task. Our algorithm contains four main steps: binarization of dependency trees, functor/argument identification, category assignment through hand-crafted rules, and category inference for unassigned constituents. To evaluate our converted treebanks, we perform lexical, sentential, and syntactic rule coverage analysis, as well as CCG parsing experiments. We achieve over 80% conversion rate on 68 treebanks of 44 languages, and over 90% lexical coverage on 81 treebanks of 52 languages.
0 Replies

Loading