Abstract: Expectation-maximization algorithms, such as those implemented in GIZA++ pervade the field of unsupervised word alignment. However, these algorithms have a problem of over-fitting, leading to “garbage collector effects,” where rare words tend to be erroneously aligned to untranslated words. This paper proposes a leave-one-out expectationmaximization algorithm for unsupervised word alignment to address this problem. The proposed method excludes information derived from the alignment of a sentence pair from the alignment models used to align it. This prevents erroneous alignments within a sentence pair from supporting themselves. Experimental results on Chinese-English and Japanese-English corpora show that the F1, precision and recall of alignment were consistently increased by 5.0% ‐ 17.2%, and BLEU scores of end-to-end translation were raised by 0.03 ‐ 1.30. The proposed method also outperformed l0-normalized GIZA++ and Kneser-Ney smoothed GIZA++.
0 Replies
Loading