Don't Take it Literally! Idiom-aware Translation via In-context Learning

ACL ARR 2025 February Submission4111 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The translation of idiomatic expressions often results in misunderstandings and inaccuracies, affecting both everyday communication and machine translation. This paper introduces Idiom-aware Translation (IDiAT), a novel framework designed to enhance idiomatic translation. As part of this work, we curate a high-quality Vietnamese-English idiom collection to provide contextual support for in-context learning (ICL) during translation. Additionally, we present the IDiAT evaluation benchmark, which includes both idiomatic and non-idiomatic text pairs to assess general translation quality and idiomatic translation performance. By leveraging ICL in large language models, IDiAT enhances few-shot demonstrations with idiom and topic descriptions, improving translation accuracy. Empirical results demonstrate that IDiAT outperforms traditional methods while requiring fewer data samples, and human evaluations confirm its effectiveness. This work advances idiomatic translation and contributes to the development of culturally aware translation systems, paving the way for future research in low-resource languages. The experimental data and code used in this paper are publicly available for research purposes.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Machine Translation, Resources and Evaluation, Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Data resources
Languages Studied: English, Vietnamese, Japanese, Korean, Thai, Finnish, Slovenian
Submission Number: 4111
Loading