Gradient-as-retrieval: Classification beyond Cross Entropy

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Loss function, classification, supervised learning
Abstract: Cross entropy (CE) is the loss of choice for classification tasks. However, computing the CE loss and gradient requires transcendental functions which may be expensive in emerging computational paradigms such as fully homomorphic encryption for privacy-preserving applications. The transcendental function-free familywise (FW) loss has been shown to enjoy strictly better statistical guarantees than the CE loss. In this work, we prove theoretical results that enable efficient computation of the gradient of the FW loss using ``retrieval-style'' algorithms. Based on our theory, we provide practical implementations. A challenge in designing new loss functions is that widely adopted optimizers and learning rate schedules are tuned to CE. Experimentally, we demonstrate that the FW loss outperforms cross entropy when we opt for parameter-free learning methods.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 12242
Loading