Hybrid Synthesis Tag and Two-Stage Training for LLM-Enhanced Tag-Aware Machine Translation

Hybrid Synthesis Tag and Two-Stage Training for LLM-Enhanced Tag-Aware Machine Translation

ACL ARR 2026 January Submission3304 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Tag-Aware Machine Translation

Abstract: Internet texts often contain rich format tags that carry structural, semantic, and functional meaning. However, current large language model (LLM)-based translation systems struggle to balance translation fluency with tag structure preservation when processing tagged text. To address this, we propose a tag-aware machine translation optimization framework leveraging LLM, which aims to simultaneously enhance translation quality and preserve tag structures. First, we design a hybrid synthesis tag strategy to generate structurally complex and high-quality tagged training data. We then introduce a two-stage tag-aware training framework: in the first stage, supervised fine-tuning is conducted through multi-task learning to enable the model to deeply understand tag semantics and their scope; in the second stage, a multi-reward mechanism is introduced for reinforcement learning, using fine-grained reward functions to optimize translation adequacy and tag preservation performance. For comprehensive evaluation, we construct a multilingual tag translation test set with complex tag structures (to be open-sourced) and propose a comprehensive evaluation metric. Experimental results demonstrate that our method significantly outperforms existing approaches in both translation quality and tag structure consistency.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: Machine Translation

Contribution Types: NLP engineering experiment, Theory

Languages Studied: English, Chinese, Japanese, German, French, Russian

Submission Number: 3304

Loading