Improving Diversity in Language Models: When Temperature Fails, Change the Loss

Alexandre Verine; Florian Le Bronnec; Kunhao Zheng; Alexandre Allauzen; Yann Chevaleyre; benjamin negrevergne

Improving Diversity in Language Models: When Temperature Fails, Change the Loss

Alexandre Verine, Florian Le Bronnec, Kunhao Zheng, Alexandre Allauzen, Yann Chevaleyre, benjamin negrevergne

Published: 01 May 2025, Last Modified: 07 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose losses to improve diversity in language models to compensate for temperature failing.

Abstract: Increasing diversity in language models is a challenging yet essential objective. A common approach is to raise the decoding temperature. In this work, we investigate this approach through a simplistic yet common case to provide insights into why decreasing temperature can improve quality (Precision), while increasing it often fails to boost coverage (Recall). Our analysis reveals that for a model to be effectively tunable through temperature adjustments, it must be trained toward coverage. To address this, we propose rethinking loss functions in language models by leveraging the Precision-Recall framework. Our results demonstrate that this approach achieves a substantially better trade-off between Precision and Recall than merely combining negative log-likelihood training with temperature scaling. These findings offer a pathway toward more versatile and robust language modeling techniques.

Lay Summary: Making language models more diverse in what they generate is important but not easy. A popular way to do this is by increasing the “temperature” during decoding, which is meant to make outputs more varied. In this study, we look closely at this method using a simple example to understand why lowering the temperature can improve output quality, but raising it often doesn’t help with generating more diverse content. We find that for temperature changes to work well, the model must first be trained to focus on covering a wide range of possibilities. To do this, we suggest a new way to train language models using a framework that balances quality and coverage. Our experiments show that this new approach works better than the usual method of just adjusting temperature after training. This could help build language models that are both more accurate and more flexible.

Primary Area: Deep Learning->Large Language Models

Keywords: Language Models, Diversity, Precision, Recall, Temperature

Submission Number: 13645

Loading