Yoruba and Unicode: An Overview of a ProblemDownload PDF

Published: 03 Mar 2023, Last Modified: 03 May 2023AfricaNLP 2023Readers: Everyone
Keywords: Unicode, Yoruba, Diacritics, Technology
TL;DR: An analysis of the problems caused by/with Unicode for writing in Yorùbá
Abstract: There is a recurring problem in the writing of Yorùbá on the internet (or on the computer) that has proven intractable over the years. This problem applies to Igbo and other African languages that depend on diacritics for disambiguation, and it has to do with not just the application of diacritics themselves but the way words are eventually rendered on the screen after said diacritics have been applied or, in most cases, after such writings have been transferred to another platform different from the one where the original writing was done. E.g. from Microsoft Word to PDF, etc. The work of Unicode has been fingered as having something to do with this problem -- a belief that has now been borne out by some fact and public confirmation -- but it also appears that the issue is more nuanced than just Unicode = bad. This paper attempts to discuss the with personal and public examples, on books and on the internet, to argue for a more holistic response to the intractable problem. The paper discusses Yorùbá tonal ambiguities, covering the history of Yorùbá orthography from Ajayi Crowther through the work of Ayo Bamgbose to modern times. It covers the technology paradox through which solutions designed to help provide inclusion have come to create more problems. It then examines the role of Unicode from its inception to date, and how it currently affects underserved languages like Yorùbá. The paper shows examples of books and web pages where these technology problems have caused misunderstandings and unintended consequences for intelligibility. It examines Unicode's explanations of its role in these problems, and weighs them against its work on emojis and other languages. It then mentions current solutions and interventions by others in the field, and concludes with suggestions of the way forward for languages like Yorùbá which depend on diacritics and good working of tonemarking software to facilitate intelligibility.
0 Replies

Loading