Keywords: Sampling temperature, LLM, maximum likelihood
Abstract: Autoregressive language models typically use temperature parameter at inference to shape the probability distribution and control the randomness of the text generated. After the text was generated, this parameter can be estimated using maximum likelihood approach. Following it, we propose a procedure to estimate the temperature of any text, including ones written by humans, with respect to a given language model. We evaluate the temperature estimation capability of a wide selection of small-to-medium LLMs. We then use the best-performing Qwen3 14B to estimate temperatures of popular corpora.
Paper Type: Short
Research Area: Natural Language Generation
Research Area Keywords: analysis
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis, Theory
Languages Studied: English
Submission Number: 5508
Loading