Team21-Entity Matters: A Comparative Study of Machine Translation Fidelity in Large Language Models

Team21-Entity Matters: A Comparative Study of Machine Translation Fidelity in Large Language Models

Indian Institute of Science Summer 2025 DA225o Submission13 Authors

07 Jun 2025 (modified: 24 Jun 2025)Indian Institute of Science Summer 2025 DA225o SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Entity Aware Machine Translation, Multi-lingual NLP, Prompt Engineering, Supervised Finetuning, Large Language Models, Crosslingual Optimized Metric for Evaluation of Translation (COMET), Manual Entity Translation Accuracy (M-ETA), SemEval 2025

TL;DR: We compare how well large language models preserve named entities during translation using prompting, fine-tuning, and specialized evaluation metrics.

Abstract: Translation of named entities is challenging for traditional machine translation systems, as there may be cultural or domain-specific references that may not be easily translated. This impacts the effectiveness of such systems in real-world scenarios. We draw inspiration from the SemEval 2025 Task 2 on Entity-Aware Machine Translation for this paper. The task is to translate an input sentence containing named entities from English to multiple target languages. In this paper, we attempt the task using various techniques and present our findings. We shall use prompt engineering to evaluate some closed-source models like Google's Gemini and OpenAI's ChatGPT. For open-sourced models such as Gemma, Llama, and Qwen, we shall use traditional fine-tuning techniques, complemented with prompt engineering strategies, to assess their performance. We shall use Crosslingual Optimized Metric for Evaluation of Translation (COMET; Rei et al., 2020) and Manual Entity Translation Accuracy (M-ETA; Conia et al., 2024) as metrics for evaluation of the quality of translation generated by these systems. We shall document all the experiments that were tried out on these different systems and techniques. Finally, we shall present the results along with a detailed comparative analysis.

Submission Number: 13

Loading