Abstract: The translation task of social media comments has attracted researchers in recent times because of the challenges to understand the nature of the comments and its representation and the need of its translation into other target languages. In the present work, we attempt two approaches of translating the Facebook comments – one using a language identifier and other without using the language identifier. We also attempt to handle some form of spelling variation of these comments towards improving the translation quality with the help of state-of-the-art statistical machine translation techniques. Our approach employs n-best-list generation of the source language of the training dataset to address the spelling variation in the comments and also enrich the resource for translation. A small in-domain dataset could further boost the performance of the translation system. Our translation task focuses on Hindi-English mixed comments collected from Facebook and our systems show improvement of translation quality over the baseline system in terms of automatic evaluation scores.
Loading