Embedding-Enhanced GIZA++: Improving Low-Resource Word Alignment Using Embeddings

Anonymous

Embedding-Enhanced GIZA++: Improving Low-Resource Word Alignment Using Embeddings

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone

Abstract: Word alignment has been dominated until recently by GIZA++, a statistical method based on the 30-year-old IBM models. New methods primarily rely on large machine translation models, massively multilingual language models, or supervision. We introduce Embedding-Enhanced GIZA++, and outperform GIZA++ without any of the aforementioned factors. Taking advantage of monolingual embedding spaces of source and target language only, we exceed GIZA++'s performance in every tested scenario for three languages pairs. In the lowest-resource setting, we outperform GIZA++ by 8.5, 10.9, and 12 AER for Ro-En, De-En, and En-Fr, respectively. We release our code at www.blind-review.code.

Paper Type: short

0 Replies

Loading