Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect?Open Website

2001 (modified: 16 Jul 2019)DDMMT@ACL 2001Readers: Everyone
Abstract: We report on our experience with building a statistical MT system from scratch, including the creation of a small parallel Tamil-English corpus, and the results of a task-based pilot evaluation of statistical MT systems trained on sets of ca. 1300 and ca. 5000 parallel sentences of Tamil and English data. Our results show that even with apparently incomprehensible system output, humans without any knowledge of Tamil can achieve performance rates as high as 86% accuracy for topic identification, 93% recall for document retrieval, and 64% recall on question answering (plus an additional 14% partially correct answers).
0 Replies

Loading