Abstract: Machine translation systems have witnessed significant advancements in various tasks, raising questions about their performance for low-resource languages, particularly those based on Indo-Aryan scripts like Urdu. This study delves into the challenges faced by machine translation systems when dealing with Urdu, a low-resource Indo-Aryan language. We conduct a comprehensive evaluation of three language models: GPT-3.5, a large language model; opus-mt-en-ur, a publicly available bilingual translation model; and IndicTrans2, a specialized translation model for Indian languages, particularly low-resource ones. Our results reveal that IndicTrans2 outperforms the other models, signifying its potential in handling low-resource language translation. Additionally, this study sheds light on the specific challenges encountered by models in Urdu translation, offering valuable insights for future improvements in the field of machine translation for low-resource Indo-Aryan languages.
Paper Type: short
Research Area: Machine Translation
Contribution Types: Approaches to low-resource settings
Languages Studied: English , Urdu
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading