Keywords: machine translation, cultural adaptability, LLM-as-a-Judge
Abstract: Large language models (LLMs) have achieved strong performance in general machine translation, yet their ability in culture-aware scenarios remain poorly understood.
To bridge this gap, we introduce CanMT, a Culture-Aware Novel-Driven Parallel Dataset for Machine Translation, together with a theoretically grounded, multi-dimensional evaluation framework for assessing cultural translation quality.
Leveraging CanMT, we systematically evaluate a wide range of LLMs and translation systems under different translation strategy constraints.
Our findings reveal substantial performance disparities across models and demonstrate that translation strategies exert a systematic influence on model behavior. Further analysis show that translation difficulty varies across types of culture-specific items, and that a persistent gap remains between models’ recognition of culture-specific knowledge and their ability to correctly operationalize it in translation outputs.
In addition, incorporating reference translations is shown to substantially improve evaluation reliability in LLM-as-a-judge, underscoring their essential role in assessing culture-aware translation quality.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Machine Translation,Cultural Analytics, NLP for Social Good
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English,Chinese
Submission Number: 9593
Loading