Abstract: Even when decoding with temperature $T=0$, large language models (LLMs) can produce divergent outputs for identical inputs. Recent works align in highlighting implementation-level sources of nondeterminism, including batch-size variation, kernel non-invariance, and floating-point non-associativity. In this work, we formalize this behavior by introducing the notion of background temperature $T_{\mathrm{bg}}$, the effective temperature induced by an implementation-dependent perturbation process observed even when nominal $T=0$. We provide clean definitions, show how $T_{\mathrm{bg}}$ relates to a stochastic perturbation governed by the inference environment $I$, and propose an empirical protocol to estimate $T_{bg}$ via the equivalent temperature $T_n(I)$ of an ideal reference system. We conclude with a set of pilot experiments run on a representative pool from the major LLM providers that demonstrate the idea and outline implications for reproducibility, evaluation, and deployment.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Deanonimized the manuscript for camera-ready submission.
- Inserted the reference to the GitHub repository implementing the basic pipeline for estimating background temperature (begin of Section 6)
- Adjusted line breaks and spacing for improved readability and respect page limits.
Code: https://github.com/RaiCRITS/background-temperature-estimation
Assigned Action Editor: ~Yonatan_Bisk1
Submission Number: 6133
Loading