Smooth Loss Functions for Deep Top-k Classification


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Human labeling of data constitutes a long and expensive process. As a consequence, many classification tasks entail incomplete annotation and incorrect labels, while being built on a restricted amount of data. In order to handle the ambiguity and the label noise, the performance of machine learning models is usually assessed with top-$k$ error metrics rather than top-$1$. Theoretical results suggest that to minimize this error, various loss functions, including cross-entropy, are equally optimal choices of learning objectives in the limit of infinite data. However, the choice of loss function becomes crucial in the context of limited and noisy data. Besides, our empirical evidence suggests that the loss function must be smooth and non-sparse to work well with deep neural networks. Consequently, we introduce a family of smoothed loss functions that are suited to top-$k$ optimization via deep learning. The widely used cross-entropy is a special case of our family. Evaluating our smooth loss functions is computationally challenging: a na{\"i}ve algorithm would require $\mathcal{O}(\binom{C}{k})$ operations, where $C$ is the number of classes. Thanks to a connection to polynomial algebra and a divide-and-conquer approach, we provide an algorithm with a time complexity of $\mathcal{O}(k C)$. Furthermore, we present a novel and error-bounded approximation to obtain fast and stable algorithms on GPUs with single floating point precision. We compare the performance of the cross-entropy loss and our margin-based losses in various regimes of noise and data size. Our investigation reveals that our loss provides on-par performance with cross-entropy for $k=1$, and is more robust to noise and overfitting for $k=5$.
  • TL;DR: Smooth Loss Function for Top-k Error Minimization