CMT Id: 174
Abstract: Often, the performance on a supervised machine learning task is evaluated with a
\emph{task loss} function that cannot be optimized directly. Examples of such loss functions
include the classification error, the edit distance and the BLEU
score. A common workaround for this
problem is to instead optimize a \emph{surrogate loss}
function, such as for instance cross-entropy or hinge loss. In
order for this remedy to be effective, it is
important to ensure that minimization of the surrogate loss
results in minimization of the task loss, a condition
that we call \emph{consistency with the task loss}.
In this work, we propose another method for deriving
differentiable surrogate losses that provably meet this requirement.
We focus on the broad class
of models that define a score for every input-output pair.
Our idea is that this score can be interpreted as an
estimate of the task loss, and that the estimation error may
be used as a consistent surrogate loss. A distinct feature of
such an approach is that it defines the desirable value of the
score for every input-output pair. We use this property
to design specialized surrogate losses for Encoder-Decoder
models often used for sequence prediction tasks.
In our experiment, we benchmark on the task of speech
recognition. Using a new surrogate loss instead of
cross-entropy to train an Encoder-Decoder speech recognizer
brings a significant ~9\% relative improvement in
terms of Character Error Rate (CER) in the case when no
extra corpora are used for language modeling.
Conflicts: umontreal.ca, jacobs-university.de
0 Replies
Loading