Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP WorldDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Linguistic disparity in the NLP world is a problem that has been widely acknowledged recently. However, different facets of this problem, or the reasons behind this disparity are seldom discussed within the NLP community. This paper provides a comprehensive analysis of the disparity that exists within the languages of the world. Using an existing language categorisation based on speaker population and vitality, we analyse the distribution of language data resources, amount of NLP/CL research, inclusion in multilingual web-based platforms, and the inclusion in pre-trained multilingual models.We show that many languages do not get covered in these resources or platforms, and even within the languages belonging to the same language group, there is wide disparity. We analyse the impact of family, geographical location, and the speaker population of languages, provide possible reasons for this disparity, and argue that a solution to this problem should be orchestrated by a wide alliance of stakeholders, of which ACL, as an association should be a key partner.
0 Replies

Loading