Towards Chapter-to-Chapter Literary Translation via Large Language Models

Towards Chapter-to-Chapter Literary Translation via Large Language Models

ACL ARR 2024 December Submission665 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of Chinese-English literature, which consists of 132 books with intricate discourse structures. Then, we propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation, and investigate the performance of commonly used machine translation models under this setting. Furthermore, we introduce a potential approach to fine-tune large language models (LLMs) within the domain of Ch2Ch literary translation, yielding impressive improvements over baselines. Through our comprehensive analysis, we reveal that literary translation in the Ch2Ch setting is challenging in nature, with respect to both model learning methods and translation decoding algorithms.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: neural machine translation, context-aware neural machine translation, literary translation

Languages Studied: English, Chinese

Submission Number: 665

Loading