Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural Language Processing, Large Language Models, Text Generation, Sampling Methods, Truncation Sampling, Stochastic Sampling, Min-p Sampling, Top-p Sampling, Temperature Sampling, Decoding Methods, LLMs reasoning
TL;DR: We propose selective sampling, a method that dynamically switches between greedy and high-temperature sampling based on a sampling risk metric.
Abstract: Diversity is essential for language models to generate creative outputs. Temperature-based sampling is a common strategy to increase diversity. However, for tasks that require high precision, e.g., mathematical reasoning, uncontrolled high temperature sampling, e.g., min-$p$ or top-$p$ lowers reasoning quality. We demonstrate that the loss of accuracy is caused by sampling incorrect continuations in sensitive positions when entropy is high. To address this, in this paper, we propose selective sampling, a method that dynamically switches between greedy and high-temperature sampling based on a sampling risk metric. This risk metric estimates the likelihood of output errors when applying high temperature sampling on the current token position. We train a lightweight classifier on a small subset of verifiable problems to predict sampling risk. The classifier can be integrated with the base language model with minimal latency overhead. Experiments on mathematical reasoning tasks show that selective sampling improves the quality-diversity trade-off, even under high-temperature settings.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 876
Loading