Keywords: fairness, evaluation, multilingual NLP / multilinguality, representation learning for language data, statistical comparisons, Double Descent, conditional language modeling, data-centric approach, diversity in AI, morphology, Transformer, meta evaluation, visualization or interpretation of learned representations, character encoding, internationalization and localization, robustness, statistical science for NLP, science in the era of AI/DL (AIxScience), transdisciplinarity
Abstract: We perform systematically and fairly controlled experiments with the 6-layer Transformer to investigate the hardness in conditional-language-modeling languages which have been traditionally considered morphologically rich (AR and RU) and poor (ZH). We evaluate through statistical comparisons across 30 possible language directions from the 6 languages of the United Nations Parallel Corpus across 5 data sizes on 3 representation levels --- character, byte, and word. Results show that performance is relative to the representation granularity of each of the languages, not to the language as a whole. On the character and byte levels, we are able to eliminate statistically significant performance disparity, hence demonstrating that a language cannot be intrinsically hard. The disparity that mirrors the morphological complexity hierarchy is shown to be a byproduct of word segmentation. Evidence from data statistics, along with the fact that word segmentation is qualitatively indeterminate, renders a decades-long debate on morphological complexity (unless it is being intentionally modeled in a word-based, meaning-driven context) irrelevant in the context of computing. The intent of our work is to help effect more objectivity and adequacy in evaluation as well as fairness and inclusivity in experimental setup in the area of language and computing so to uphold diversity in Machine Learning and Artificial Intelligence research. Multilinguality is real and relevant in computing not due to canonical, structural linguistic concepts such as morphology or "words" in our minds, but rather standards related to internationalization and localization, such as character encoding --- something which has thus far been sorely overlooked in our discourse and curricula.
One-sentence Summary: We investigate performance disparity in multilingual NLP with Transformer conditional LMs, and find, in the context of computing, morphological complexity to be a byproduct of word segmentation and disparity arising therefrom unwarranted.