A cluster-to-cluster framework for neural machine translation


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: The quality of a machine translation system depends largely on the availability of sizable parallel corpora. For the recently popular Neural Machine Translation (NMT) framework, data sparsity problem can become even more severe. With large amount of tunable parameters, the NMT model may overfit to the existing language pairs while failing to understand the general diversity in language. In this paper, we advocate to broadcast every sentence pair as two groups of similar sentences to incorporate more diversity in language expressions, which we name as parallel cluster. Then we define a more general cluster-to-cluster correspondence score and train our model to maximize this score. Since direct maximization is difficult, we derive its lower-bound as our surrogate objective, which is found to generalize point-point Maximum Likelihood Estimation (MLE) and point-to-cluster Reward Augmented Maximum Likelihood (RAML) algorithms as special cases. Based on this novel objective function, we delineate four potential systems to realize our cluster-to-cluster framework and test their performances in three recognized translation tasks, each task with forward and reverse translation directions. In each of the six experiments, our proposed four parallel systems have consistently proved to outperform the MLE baseline, RL (Reinforcement Learning) and RAML systems significantly. Finally, we have performed case study to empirically analyze the strength of the cluster-to-cluster NMT framework.
  • TL;DR: We invent a novel cluster-to-cluster framework for NMT training, which can better understand the both source and target language diversity.
  • Keywords: Natural Language Processing, Machine Translation, Deep Learning, Data Augmentation