I Speak for the Árboles: Developing a Dependency Treebank for Spanish L2 and Heritage Speakers

Published: 22 Jun 2025, Last Modified: 22 Jun 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: spanish learner morphosyntax, dependency annotations, parsing evaluation
TL;DR: We present the first UD-style syntactic annotations for Spanish learner data using the COWSL2H corpus, adapting the framework to account for learner-specific features and evaluating parser performance.
Abstract: We introduce the first set of Universal Dependencies (UD) annotations for Spanish learner writing from the UC Davis COWSL2H corpus. Our annotations include lemmatization, POS tagging, and syntactic dependencies. We adapt the existing UD framework for Spanish L1 to account for learner-specific features such as code-switching and non-canonical syntax. A suite of parsing evaluation experiments shows that parsers trained on learner data together with moderate sizes of Spanish L1 data can yield reasonable performance. Our annotations and parsers will be openly accessible to motivate future development of learner-oriented language technologies.
Archival Status: Archival
Paper Length: Short Paper (up to 4 pages of content)
Submission Number: 194
Loading