Mandarin classifier systems optimize to accommodate communicative pressures

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Linguistic Theories, Cognitive Modeling, and Psycholinguistics
Keywords: Mandarin Chinese, classifiers, noun classes, noun class processing, word embeddings, mutual information, GAM
TL;DR: Noun-classifier combinations are sensitive to the nouns’ frequencies, contextual similarity, and the NP context nouns appear in; differential behaviors reflect opposing communicative pressures of processing vs. learning.
Abstract: Previous work on noun classification implies that gender systems are inherently optimized to accommodate communicative pressures on human language learning and processing (Dye. et al 2017, 2018). They state that languages make use of either grammatical (e.g., gender) or probabilistic (pre-nominal modifiers) to smoothe the entropy of nouns in context. We show that even languages that are considered genderless, like Mandarin Chinese, possess a noun classification device that plays the same functional role as gender markers. Based on close to 1M Mandarin noun phrases extracted from the Leipzig Corpora Collection (Goldhahn et al. 2012) and their corresponding fastText embeddings (Bojanowski et al. 2016), we show that noun-classifier combinations are sensitive to same frequency, similarity, and co-occurrence interactions that structure gender systems. We also present the first study of the effects of the interaction between grammatical and probabilisitic noun classification.
Submission Number: 4914
Loading