A Joint Source Channel Model for the English to Bengali Back Transliteration

Tirthankar Dasgupta; Manjira Sinha; Anupam Basu

A Joint Source Channel Model for the English to Bengali Back Transliteration

Tirthankar Dasgupta, Manjira Sinha, Anupam Basu

Published: 01 Jan 2013, Last Modified: 29 Jul 2025MIKE 2013EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper we present an English-to-Bengali back transliteration system that can be used to transliterate Bengali texts written in Romanized English, back to its original script. Our proposed system uses a bilingual parallel corpus of English-Bengali transliterated word pairs and applies both the orthographic as well as phonetic information to two different computational models namely, the joint source channel model and the trigram model, to automatically identify, extract and learning of transliteration unit (TU) pairs from both the source and target language words. Finally, the system predicts the top 10 best possible outcome of the given input text. We further extend our work to make the target word prediction module more robust. This is done by the phonological analysis of the generated target sentence. Both the models have been evaluated with a set of 2000 Romanized Bengali test words. Our initial evaluation results clearly shows that the joint source channel model performs much better than the trigram model.

Loading