AgentDiscoTrans: Agentic LLMs for Discouse-level Machine Translation

ACL ARR 2025 February Submission7568 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this paper, we propose AgentDiscoTrans, a novel agentic framework for document-level machine translation that leverages specialized LLM agents to process long documents at the discourse level. Our system segments an input document into coherent discourse units—drawing inspiration from the theories presented in Attention, Intentions, and the Structure of Discourse—and then translates each unit using a Translation Agent that incorporates contextual information from a dynamic Memory. The Memory Agent updates and maintains critical translation cues such as discourse markers, entity mappings, noun-pronoun mappings, and phrase translations, ensuring inter-discourse consistency. Experiments on multiple datasets, including the TED test sets from IWSLT2017, the mZPRT corpus, and the WMT2022 dataset, demonstrate that our system outperforms competitive baselines (such as NLLB, Google Translate, and DELTA) in terms of automatic metrics (d-BLEU, d-COMET, TER) and human evaluations focusing on both General Quality and Discourse Awareness. Our ablation studies further validate the importance of both discourse segmentation and Memory updating for achieving high-quality translations.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: discourse relations , discourse parsing , discourse and multilinguality , domain adaptation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: English, Chinese, German, French, Japanese
Submission Number: 7568
Loading