Machine Translation of Cooking Videos Using Descriptions of the Images by Chain-of-Thought Augmentation

ACL ARR 2025 July Submission1443 Authors

29 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: English cooking videos often contain polysemous words and omitted expressions, making accurate translation challenging. This study aims to improve English-Japanese machine translation of cooking videos by utilizing images extracted from the video. We adopt a Chain-of-Thought Augmentation (CoTA) approach, where the model generates descriptions of images and utilizes them as auxiliary information for the translation task. In our experiments, we selected sentences from an English-Japanese cooking video corpus that were difficult to translate due to polysemous words. We evaluated the performance using GPT-4o and Qwen2-VL with COMET and BLEU scores. The results demonstrate that incorporating images improves translation accuracy, with a particularly strong tendency for CoTA applied to GPT-4o to produce more accurate translations.
Paper Type: Short
Research Area: Machine Translation
Research Area Keywords: Machine Translation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: Japanese, English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: In section 3
B2 Discuss The License For Artifacts: No
B2 Elaboration: Because the data we used are publicly available.
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: In section 3
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: In section 3
B5 Documentation Of Artifacts: Yes
B5 Elaboration: In section 3
B6 Statistics For Data: Yes
B6 Elaboration: In section 3
C Computational Experiments: Yes
C1 Model Size And Budget: No
C1 Elaboration: The parameters of GPT models are opaque.
C2 Experimental Setup And Hyperparameters: N/A
C3 Descriptive Statistics: No
C3 Elaboration: Only the results of a single execution are listed.
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: In Appendix E
Author Submission Checklist: yes
Submission Number: 1443
Loading