The Hidden Folk: Linguistic Properties encoded in Multilingual Contextual Character RepresentationsDownload PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: To gain a better understanding of the linguistic information encoded in character-based language models, we probe the multilingual contextual CANINE model. We design a range of phonetic probing tasks in six Nordic languages, including Faroese as an additional zero-shot instance. The results show that phonetic information such as consonant voicing and vowel roundness are indeed encoded in the character representations and that this information is transferred to a similar zero-shot language.
Paper Type: short
Research Area: Interpretability and Analysis of Models for NLP
0 Replies

Loading