The Double-Edged Sword of Reasoning LLMs in Translation: Disambiguation, Hallucination, and Efficiency

ACL ARR 2026 January Submission8025 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Translation, O1-Like Model, Chain of Thought, Reasoning Large Language Model
Abstract: The Reasoning LLMs are transforming AI by simulating human cognitive processes, but their performance in multilingual machine translation (MMT) remains underexplored. This study examines: (1) how Reasoning LLMs perform in MMT tasks and (2) what factors influence their translation quality. We evaluate multiple Reasoning LLMs and compare them with traditional LLMs like ChatGPT and GPT-4o. Results show that Reasoning LLMs establish new multilingual translation benchmarks. They demonstrate strengths in historical and cultural translation but exhibit a tendency for \textbf{rambling issues} in more challenging scenarios. Further analysis reveals three key insights: (1) High inference costs and slower processing speeds make complex translation tasks more resource-intensive. (2) Translation quality improves with model size, enhancing commonsense reasoning and cultural translation. (3) The temperature parameter significantly impacts output quality—lower temperatures yield more stable and accurate translations, while higher temperatures reduce coherence and precision.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Machine Translation, Large Language Model, o1-Like LLMs
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Data analysis
Languages Studied: English
Submission Number: 8025
Loading