Leveraging CoHere Multilingual Embeddings and Inverted Softmax Retrieval for Automatic Parallel Sentence Alignment in Low-Resource Languages
Abstract: We present an improved method for automatic
parallel sentence alignment in low- resource
languages. We used CoHere multilingual
embeddings and inverted softmax retrieval.
Our technique achieved a higher F1-score of
78.30% on the MAFAND-MT test set, compared
to the existing technique’s 54.75%. Precision
and recall have shown similar performance.
We assessed the quality of the extracted data by
demonstrating that it outperforms the existing
technique in terms of low-resource translation
performance.
Submission Number: 4
Loading