Keywords: matrix language frame theory, system morphemes, matrix language prediction
TL;DR: Several novel approaches for discovering system morphemes based on the System Morpheme Principle for Matrix Language determination from the Matrix Language Frame theory were introduced.
Abstract: Code-switching (CS) is the process of speakers interchanging between several languages. CS is a complex process. To better describe CS speech the Matrix Language Frame (MLF) theory introduces the concept of a Matrix Language (ML), which is the language that provides the grammatical structure for a CS sentence. In this work several novel approaches for discovering system morphemes based on the MLF theory were introduced. Deterministic and predictive variations of the System Morpheme Principle (SMP) were developed to discover system morphemes through the task of ML determination and prediction. Morpheme Order Principle (MOP) from the MLF theory was used to assess the ML determination performance from the two SMP implementations. The deterministic approach revealed the correlation between the conventional system morphemes (pronouns, conjunctions, determiners, auxiliaries) and token frequencies averaged over Part of Speech (POS). Moreover, the deterministic approach has also revealed the ranking of the POS with respect to the ML determination task, showing the importance of particles and adpositions. Using monolingual data for discovering the POS that act as system morpheme types has led to a 0.07 Matthew's Correlation Coefficient (MCC) increase compared to the baseline for SEAME and a 0.04 increase for Miami. A predictive SMP was trained and has achieved 0.03 MCC increase demonstrating the advantages of the statistical analysis of the linguistic properties of data in the deterministic SMP. This study provides valuable insight into the properties of tokens in relation to their grammatical categories in CS data.
Submission Number: 4
Loading