MAML: \textbf{M}ulti-\textbf{A}gentic and \textbf{M}ulti-\textbf{L}evel CoT for LLM-Based Automatic Subtitle Translation Evaluation

MAML: \textbf{M}ulti-\textbf{A}gentic and \textbf{M}ulti-\textbf{L}evel CoT for LLM-Based Automatic Subtitle Translation Evaluation

ACL ARR 2025 February Submission5848 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Subtitle translation is crucial in ensuring global accessibility, particularly for creative content such as films and television. However, the manual translation is labor-intensive and time-consuming, often requiring linguistic and cultural adaptation. While post-editing accelerates the translation process, effective automatic evaluation methods are crucial to ensure fair and reliable quality assessment while minimizing human effort. We propose TEAM (Translation Evaluation and Assessment with Multi-agents), a novel agent-based evaluation metric designed to identify the most creatively aligned translations while preserving linguistic quality. TEAM assesses key factors such as cultural relevance, emotional tone, humor, and engagement, helping post-editors select and refine the best machine-generated translations. Additionally, we propose MLCoT (Multi-Level Chain-of-Thought), a simpler metric where multiple agents evaluate adequacy, fluency, and creativity. Experiments on English-Hindi and English-Spanish subtitles show that both TEAM and MLCoT outperform \textsc{CometKiwi} in preference ranking and correlation with humans.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Automatic Evaluation, Translation Quality Estimation, Multi-agent Framework

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English, Hindi, Spanish

Submission Number: 5848

Loading