Keywords: temperature scaling, argmax inference, amortized inference, autoregressive models
TL;DR: We present a non-myopic improvement of standard autoregressive temperature scaling, that generalizes to all likelihood-based models.
Abstract: Temperature scaling is a popular technique for tuning the sharpness of a model distribution. It is used extensively for sampling likely generations and calibrating model uncertainty, and even features as a controllable parameter to many large language models in deployment. However, autoregressive models rely on myopic temperature scaling that greedily optimizes the next token. To address this, we propose \textit{Long Horizon Temperature Scaling} (LHTS), a novel approach for sampling from temperature-scaled \textit{joint} distributions. LHTS is compatible with all likelihood-based models, and optimizes for the long-horizon likelihood of samples. We derive a temperature-dependent LHTS objective, and show that fine-tuning a model on a range of temperatures produces a single model capable of generation with a controllable long-horizon temperature parameter. We experiment with LHTS on image diffusion models and character/language autoregressive models, demonstrating its advantages over myopic temperature scaling in likelihood and sample quality, and showing improvements in accuracy of a multiple choice analogy by $10$%.
Submission Number: 2
Loading