Score Combination for Improved Parallel Corpus Filtering for Low Resource ConditionsDownload PDFOpen Website

2020 (modified: 03 May 2024)WMT@EMNLP 2020Readers: Everyone
Abstract: This paper presents the description of our submission to WMT20 sentence filtering task. We combine scores from custom LASER built for each source language, a classifier built to distinguish positive and negative pairs and the original scores provided with the task. For the mBART setup, provided by the organizers, our method shows 7% and 5% relative improvement, over the baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.
0 Replies

Loading