Computational historical linguistics and language diversity in South AsiaDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: South Asia is home to a plethora of languages, most of which are severely lacking access to language technologies that have been developed with the maturity of NLP/CL. This linguistic diversity, however, also results in a research environment conducive to the study of comparative, contact, and historical linguistics---fields which necessitate the gathering of extensive data from many languages. We claim that data scatteredness (rather than scarcity) is the primary obstacle in the development of South Asian language technology, and suggest that the study of language history is uniquely aligned with surmounting this obstacle. We review recent developments in, and the intersection of, South Asian NLP and historical--comparative linguistics, explaining our current efforts in this area while also offering new paths towards breaking the data barrier.
0 Replies
