Neural Detection of Cross-lingual Syntactic Knowledge

Published: 01 Jan 2022, Last Modified: 21 May 2025IberSPEECH 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, there has been prominent development in pretrained multilingual language models, such as mBERT, XLMR, etc., which are able to capture and learn linguistic knowledge from input across a variety of languages simultaneously. However, little is known about where multilingual models localise what they have learnt across languages. In this paper, we specifically evaluate cross-lingual syntactic information embedded in CINO, a more recent multilingual pre-trained language model. We probe CINO on Universal Dependencies treebank datasets of English and Chinese Mandarin for two syntax-related layerwise evaluation tasks: Part-of-Speech Tagging at token level and Syntax Tree-depth Prediction at sentence level. The results of our layer-wise probing experiments show that token-level syntax is localisable in higher layers and consistency is shown across the typologically different languages, whereas sentencelevel syntax is distributed across the layers in typology-specific and universal manners.
Loading