On the Consistency of Spherical Z Loss

Abhishek Sharma

Feb 12, 2018 (modified: Feb 12, 2018) ICLR 2018 Workshop Submission readers: everyone
  • Abstract: Extremely large and sparse output space in a deep net classifier induces two major challenges of high computational complexity and class ambiguity. Class ambiguity is usually tackled by optimizing top-k error instead of zero one loss. To deal with computational complexity, recent work of ~\cite{Vincent2015EfficientEG} and ~\cite{Brbisson15} introduced a family of spherical loss that comes with a weight update algorithm that is independent of output space size. In this family, Z loss is of particular interest since it outperforms other spherical losses and log-softmax on top-k scores. However, there exists no theoretical result on the top-k calibration of Z loss or any concrete connection between top-k scores and hyper-parameters of Z loss. This paper provides insights on the relationship between the two and answers how and why hyper-parameters of Z loss are essential to optimize top-k scores.
  • Keywords: Spherical Z Loss, top-k calibration