A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining

Hassan Sajjad, Alexander M. Fraser, Helmut Schmid

2012 (modified: 13 Nov 2022)ACL (1) 2012Readers: Everyone

Abstract: We propose a novel model to automatically extract transliteration pairs from parallel corpora. Our model is efficient, language pair independent and mines transliteration pairs in a consistent fashion in both unsupervised and semi-supervised settings. We model transliteration mining as an interpolation of transliteration and non-transliteration sub-models. We evaluate on NEWS 2010 shared task data and on parallel corpora with competitive results.

0 Replies