Size- and Dispersion-Corrected Two-Level Softmax Sampling

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Softmax Sampling, Two-Level Softmax Sampling, Bias Correction, Sublinear-Time Inference, Recommender Systems
TL;DR: We characterize systematic biases in two-level softmax sampling, caused by cluster size imbalance and intra-cluster dispersion. We propose two corrected sampling methods that provide more faithful softmax approximations with negligible overhead.
Abstract: Sampling from a softmax distribution is a fundamental operation in machine learning, but its linear complexity in the number of items makes exact sampling impractical at scale. Two-level softmax (2LS) sampling is a popular alternative enabling sublinear-time sampling. Assuming items are partitioned into clusters, 2LS first samples a cluster and then an item within it. In this paper, we show that, despite its advantages, 2LS introduces systematic and undesirable sampling biases, which arise from misweighting clusters by ignoring both cluster size imbalance and intra-cluster similarity dispersion. We propose two sampling methods, Size-Corrected 2LS (S-2LS) and Size- and Dispersion-Corrected 2LS (SD-2LS), which correct these biases and provide provably better softmax approximations with negligible to non-existent computational overhead. In-depth experiments on five large-scale datasets validate the improved sampling properties of our methods. This workshop paper is under review for presentation at an international conference.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 69
Loading