Abstract: Word alignment has been dominated until recently by GIZA++, a statistical method based on the 30-year-old IBM models. New methods primarily rely on large machine translation models, massively multilingual language models, or supervision. We introduce Embedding-Enhanced GIZA++, and outperform GIZA++ without any of the aforementioned factors. Taking advantage of monolingual embedding spaces of source and target language only, we exceed GIZA++'s performance in every tested scenario for three languages pairs. In the lowest-resource setting, we outperform GIZA++ by 8.5, 10.9, and 12 AER for Ro-En, De-En, and En-Fr, respectively. We release our code at www.blind-review.code.
Paper Type: short
0 Replies
Loading