Can the Variation of Model Weights be used as a Criterion for Self-Paced Multilingual NMT?

Anonymous

Can the Variation of Model Weights be used as a Criterion for Self-Paced Multilingual NMT?

Anonymous

17 Jun 2023ACL ARR 2023 June Blind SubmissionReaders: Everyone

Abstract: Many-to-one neural machine translation systems improve over one-to-one systems when training data is scarce. In this paper, we design and test a novel algorithm for selecting the language of minibatches when training such systems. The algorithm changes the language of the minibatch when the weights of the model do not evolve significantly, as measured by the smoothed KL divergence between all layers of the Transformer network. This algorithm outperforms the use of alternating monolingual batches, but not the use of shuffled batches, in terms of translation quality (measured with BLEU and COMET) and convergence speed.

Paper Type: short

Research Area: Machine Translation

0 Replies

Loading