Document-level Translation with LLM Reranking: Team-J at WMT 2024 General Translation Task

Published: 01 Jan 2024, Last Modified: 18 May 2025WMT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We participated in the constrained track for English-Japanese and Japanese-Chinese translations at the WMT 2024 General Machine Translation Task. Our approach was to generate a large number of sentence-level translation candidates and select the most probable translation using minimum Bayes risk (MBR) decoding and document-level large language model (LLM) re-ranking. We first generated hundreds of translation candidates from multiple translation models and retained the top 30 candidates using MBR decoding. In addition, we continually pre-trained LLMs on the target language corpora to leverage document-level information. We utilized LLMs to select the most probable sentence sequentially in context from the beginning of the document.
Loading