Keywords: pro-drop, The Little Prince, parallel corpora, cross-lingual transfer, multilingual encoders
Abstract: Language models typically learn from unannotated corpora, yet their ability to acquire abstract syntactic parameters like pro-drop through cross-lingual transfer remains an open question. Analogous to human second language acquisition, we examine whether models can leverage annotated data from a source language to induce syntax in a target. We evaluate two multilingual encoder models using an annotated parallel corpus of The Little Prince across English, Spanish, Korean, and Chinese, comparing zero-shot baselines with in-language fine-tuning and cross-lingual transfer. While supervised fine-tuning consistently improves performance, cross-lingual transfer yields inconsistent results across pairs. Notably, transfer between Spanish and Chinese results in adverse effects, suggesting difficulty reconciling morphologically-licensed pro-drop with topic-drop. Our findings suggest that language models may learn language-specific licensing strategies rather than a universal syntactic parameter, as cross-lingual exposure does not always facilitate positive transfer.
Paper Type: Short
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: linguistic theories, cognitive modeling, computational psycholinguistics
Contribution Types: Theory
Languages Studied: English, Spanish, Chinese, Korean
Submission Number: 9112
Loading