Differentiable Top-k: From One-Hot to k-Hot

Published: 21 Nov 2025, Last Modified: 21 Nov 2025DiffSys 2025EveryoneRevisionsCC BY 4.0
Keywords: top-k, k-hot, subset, relaxed, differentiable, sampling
TL;DR: We propose a framework for differentiable top-k by generalizing from one-hot to k-hot.
Abstract: The one-hot representation, argmax operator, and its differentiable relaxation, softmax, are ubiquitous in machine learning. These building blocks lie at the heart of everything from the cross-entropy loss and attention mechanism to differentiable sampling. However, their $k$-hot counterparts are not as universal. In this paper, we consolidate the literature on differentiable top-$k$, showing how the $k$-capped simplex connects relaxed top-$k$ operators and $\pi$ps sampling to form an intuitive generalization of one-hot sampling. In addition, we propose sigmoid top-$k$, a scalable relaxation of the top-$k$ operator that is fully differentiable and defined for continuous $k$. We validate our approach empirically and demonstrate its computational efficiency.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 39
Loading