xrPhonetic: Akshar-based Phonetic String SimilarityDownload PDF


17 Feb 2023 (modified: 05 May 2023)ACL ARR 2023 February Blind SubmissionReaders: Everyone
Abstract: Establishing String Similarity based on phonetics has been widely used in information retrieval systems to identify differently spelled but similar-sounding words. Another common application often involves calculating a similarity score between two words coming from two different sources which possibly can be two different spelling representations of the same word. A very interesting and common subset of this is estimating the phonetic similarity of two words that are transliterated to Roman script from a different language. For such a use case, it would be more effective if we can use the knowledge of the nature of the concerned writing system from which the words originated as people usually tend to carry over the nuances of the underlying writing system during transliteration. We propose xrPhonetic, a novel phonetic similarity algorithm, for words transliterated to Roman script from languages using Abugida-based scripts by treating \textit{akshars} as the most fundamental atomic unit of words with consonant and vowel phonemes as its further sub-atomic units, and by having weighted phoneme mappings to get a more continuous spectrum of phonetic similarity.
Paper Type: short
Research Area: Phonology, Morphology and Word Segmentation
0 Replies
