A Distributional Diagnostic of Categorical Stability in a Romanian Treebank

Published: 27 May 2026, Last Modified: 27 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0
Keywords: Universal Dependencies, Categorical Gradience, POS Tag Stability, Romanian Syntax
Working Group: WG1: Corpus annotation, WG2: Lexicon-corpus interface
WG1 Tasks: Task 1.1: Linguistic typology and multilingual corpus annotation, Task 1.4: Sharing tools, formats, and infrastructure, Task 1.3: Extensions and updates to morphosyntactic annotation guidelines
Abstract: While part-of-speech (POS) tags are traditionally treated as discrete silos, distributional evidence reveals a complex syntactic continuum that is often flattened by the rigid standardisation of Universal Dependencies (UD). We propose a quantitative framework to map these fuzzy boundaries by constructing morphosyntactic lexeme profiles. By deploying word-level diagnostics (Purity and Entropy) alongside category-level stability metrics, we identify "bridge words" that distributionally traverse categories and pinpoint the specific morphosyntactic features driving these shifts. A case study on the UD_Romanian-RRT treebank demonstrates a stark divide between highly stable content classes and deeply gradient functional categories. Our results suggest that categorical instability often reflects genuine distributional syncretism rather than mere annotation error, providing a data-driven basis to evaluate how well the UD schema mirrors linguistic reality. Ultimately, this work establishes a robust diagnostic for annotation bias, enhancing both treebank quality and model interpretability within the UD framework.
WG2 Tasks: Task 2.1: Cross-language unification of lexical features
Tracks For Type Of Contribution: Work in progress
Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: No
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 29
Loading