Developing Language Technology and NLP tools for endangered languages: Torwali

Published: 02 Aug 2024, Last Modified: 12 Nov 2024WiNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: NLP, Morphological analysis, Word segmentation, Transliteration, Endangered languages.
Abstract: Torwali [ISO 639-3: trw] is an endangered and indigenous language spoken in North of Pakistan. It is a low-resource language written in RTL Perso-Arabic script. This paper discusses the challenges and approaches in processing of Torwali with various NLP techniques to develop tools and resources. This work contributes towards morphological analysis, word segmentation, POS tagging and transliteration of Torwali. This work, on which this paper is based, can be used as a resource for other lexically similar endangered languages of northern Pakistan and will help to improve the digital presence of Torwali language and will safeguard it against endangerment.
Submission Number: 9
Loading