Keywords: LLM, Tag-Aware Machine Translation
Abstract: Internet texts often contain rich format tags that carry structural, semantic, and functional meaning. However, current large language model (LLM)-based translation systems struggle to balance translation fluency with tag structure preservation when processing tagged text. To address this, we propose a tag-aware machine translation optimization framework leveraging LLM, which aims to simultaneously enhance translation quality and preserve tag structures. First, we design a hybrid synthesis tag strategy to generate structurally complex and high-quality tagged training data. We then introduce a two-stage tag-aware training framework: in the first stage, supervised fine-tuning is conducted through multi-task learning to enable the model to deeply understand tag semantics and their scope; in the second stage, a multi-reward mechanism is introduced for reinforcement learning, using fine-grained reward functions to optimize translation adequacy and tag preservation performance. For comprehensive evaluation, we construct a multilingual tag translation test set with complex tag structures (to be open-sourced) and propose a comprehensive evaluation metric. Experimental results demonstrate that our method significantly outperforms existing approaches in both translation quality and tag structure consistency.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Machine Translation
Contribution Types: NLP engineering experiment, Theory
Languages Studied: English, Chinese, Japanese, German, French, Russian
Submission Number: 3304
Loading