Abstract: Maximum Likelihood Estimation (MLE) is commonly used in machine translation, where models with higher likelihood are assumed to perform better in translation. However, this assumption does not hold in the non-autoregressive Transformers (NATs), a new family of translation models. In this paper, we present both theoretical and empirical analysis on why simply maximizing the likelihood does not produce a good NAT model. Based on the theoretical analysis, we propose Maximum Proxy-Likelihood Estimation (MPLE), a novel method to address the training issue in MLE. Additionally, MPLE provides a novel perspective to understand existing success in training NATs, namely much previous work can be regarded as implicitly optimizing our objective.
0 Replies
Loading