A Word-Splitting Approach to Sanskrit Sandhi Words of Kannada Useful in Effective English Translation
Abstract: Natural Language Processing is a field of artificial intelligence that facilitates man-machine interactions through vernacular languages. There are two types of Sandhi in the Kannada Language: Kannada Sandhi and Sanskrit Sandhi. A morph-phonemic word 'Sandhi' is formed when two words or distinct morphemes are joined or combined. A Sandhi word splitting is the reverse of the process of formation. The rules govern Sandhi words in all the Dravidian languages. A rule-based splitting method is developed to obtain the constituent words from the Sanskrit Sandhi words in Kannada sentences. Once the Sanskrit Sandhi (SS) words are split, the type of Sandhi is also identified, leading to an effective translation of the Sanskrit Sandhi words into English. This paper covers seven types of SS words: SavarNadeergha, YaN, GuNa, Vruddhi, Jatva, Shchutva and Anunasika Sandhi. The identified split points are as per the Sandhi rules. A dataset of 4900 Sanskrit Sandhi words occurring in Kannada sentences is used to assess the performance of the proposed method, which has given an accuracy of 90.03% and 85.87% in Sanskrit Sandhi identification and in an acceptable English translation. The work finds applications in other Dravidian languages.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Sandhi Splitting, Rule-based Methodology, Transliteration, Natural Language Processing
Contribution Types: NLP engineering experiment
Languages Studied: KANNADA, ENGLISH
Submission Number: 329
Loading