Abstract: Low resource languages face a major challenge in developing machine translation systems due to unavailability of accurate and
parallel datasets with a large corpus size. In
the present work, Factored Neural machine
Translation Systems have been developed for
the following bidirectional language pairs:
English & Bhojpuri, English & Magahi, English & Sindhi along with the uni-directional
language pair English - Latvian. Both the
lemma and Part of Speech (PoS) tags are included as factors to the surface-level English
words. No factoring has been done on the
low resource language side. The submitted
systems have been developed with the parallel datasets provided and no additional parallel or monolingual data have been included.
All the seven systems have been evaluated by
the LoResMT 2019 organizers in terms of
BLEU score, Precision, Recall and F-measure evaluation metrics. It is observed that
better evaluation scores have been obtained
in those MT systems in which English is the
target language. The reason behind this is
that the incorporation of lemma and pos tags
factors for English words has improved the
vocabulary coverage and has also helped in
generalization. It is expected that incorporation of linguistic factors on the low resource
language words would have improved the
evaluation scores of the MT systems involving those languages on the target side.
0 Replies
Loading