Abstract: Large language models (LLMs) rely heavily on sampling methods to generate diverse and high-quality text.
While existing sampling methods like top-$p$ and min-$p$ have identified the detrimental effects of low-probability tails in LLMs' outputs, they still fail to effectively distinguish between diversity and noise.
This limitation stems from their reliance on probability-based metrics that are inherently sensitive to temperature scaling.
Through empirical and theoretical analysis, we make two key discoveries: (1) the pre-softmax logits exhibit a clear statistical separation between informative tokens and noise, and (2) we prove the mathematical equivalence of min-$p$ and top-(1-$p$) under uniform distribution over logits.
These findings motivate the design of top-n$\sigma$, a novel sampling method that identifies informative tokens by eliminating noise directly in logit space.
Unlike existing methods that become unstable at high temperatures, top-$n\sigma$ achieves temperature-invariant token selection while preserving output diversity.
Extensive experiments across reasoning and creative writing tasks demonstrate that our method consistently outperforms existing approaches, with particularly significant improvements in high-temperature settings.
Paper Type: Long
Research Area: Generation
Research Area Keywords: inference methods, text-to-text generation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 1757
Loading