GuyLingo: The Republic of Guyana Creole CorporaDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support. One such region is the Caribbean. While commonly labeled as "English speaking," the ex-British Caribbean region consists of a myriad of Creole languages and dialects thriving alongside English. In this paper, we present GuyLingo: a comprehensive corpus designed for advancing NLP research in the domain of Creolese, the most widely spoken language in the culturally rich nation of Guyana. We first outline our framework for gathering and digitizing this diverse corpus, inclusive of colloquial expressions, idioms, and regional variations. We then demonstrate, alongside discussions with Creolese experts, the challenges of training and evaluating NLP models for machine translation for Creolese. Lastly, we discuss the unique opportunities presented by recent NLP advancements for accelerating the formal adoption of Creole languages in the Caribbean.
Paper Type: short
Research Area: Special Theme (conference specific)
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: creolese, english
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview