Task Loss Estimation for Structured Prediction

Dzmitry Bahdanau; Dmiriy Serdyuk; Philémon Brakel; Nan Rosemary Ke; Jan Chorowski; Aaron Courville; Yoshua Bengio

Task Loss Estimation for Structured Prediction

Dzmitry Bahdanau, Dmiriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

20 Apr 2024 (modified: 18 Feb 2016)ICLR 2016 workshop submissionReaders: Everyone

CMT Id: 174

Abstract: Often, the performance on a supervised machine learning task is evaluated with a \emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a \emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for this remedy to be effective, it is important to ensure that minimization of the surrogate loss results in minimization of the task loss, a condition that we call \emph{consistency with the task loss}. In this work, we propose another method for deriving differentiable surrogate losses that provably meet this requirement. We focus on the broad class of models that define a score for every input-output pair. Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss. A distinct feature of such an approach is that it defines the desirable value of the score for every input-output pair. We use this property to design specialized surrogate losses for Encoder-Decoder models often used for sequence prediction tasks. In our experiment, we benchmark on the task of speech recognition. Using a new surrogate loss instead of cross-entropy to train an Encoder-Decoder speech recognizer brings a significant ~9\% relative improvement in terms of Character Error Rate (CER) in the case when no extra corpora are used for language modeling.

Conflicts: umontreal.ca, jacobs-university.de

0 Replies

Loading