Abstract: Minimum Bayes Risk (MBR) decoding has become a popular decoding strategy for different natural language generation tasks, especially machine translation. MBR relies on an estimator of an expected loss, where we use our learned model as a proxy for the target distribution that we wish to take this expectation with respect to. However, this reliance can be problematic if the model is a flawed proxy, for example, in light of a lack of training data in a specific domain. In this work, we show how using a posterior over model parameters, and decoding with a weighted-averaging over multiple models, can improve the performance of MBR by accounting for uncertainty over the learned model. We benchmark different methods for learning posteriors and show that performance correlates with the diversity of the combined set of models' predictions. Intriguingly, prediction diversity also determines whether risk can be successfully used for selective prediction.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: efficient inference for MT,MT theory,modeling,pre-training for MT
Contribution Types: NLP engineering experiment
Languages Studied: English, German
Submission Number: 499
Loading