Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
A differentiable BLEU loss. Analysis and first results
Noe Casas, José A.R. Fonollosa, Marta R. Costa-jussà
Feb 12, 2018 (modified: Feb 12, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:In natural language generation tasks, like neural machine translation and image captioning, there is usually a mismatch between the optimized loss and the de facto evaluation criterion, namely token-level maximum likelihood and corpus-level BLEU score. This article tries to reduce this gap by defining differentiable computations of the BLEU and GLEU scores. We test this approach on simple tasks, obtaining valuable lessons on its potential applications but also its pitfalls, mainly that these loss functions push each token in the hypothesis sequence toward the average of the tokens in the reference, resulting in a poor training signal.
TL;DR:Differentiable BLEU loss --> Poor results on simple tasks --> Analysis --> BLEU loss pushes weights toward average of reference tokens, hence poor training signal
Keywords:differentiable, BLEU, GLEU, NMT, seq2seq
Enter your feedback below and we'll get back to you as soon as possible.