- Reviewed Version (pdf): https://openreview.net/references/pdf?id=dUfrxjHT7w
- Keywords: natural language processing, language modeling, sequence modeling, temperature scaling
- Abstract: Current practices to apply temperature scaling assume either a fixed, or a manually-crafted dynamically changing schedule. However, our studies indicate that the individual optimal trajectory for each class can change with the context. To this end, we propose context-aware temperature, a generalized approach to provide an individual optimal temperature trajectory over the context for each vocabulary, while allowing the temperature to be learned along with the remaining model parameters during training. Experiment results confirm that the proposed method significantly improves state-of-the-art language models, achieving a perplexity of 19.90 on Penn Treebank, 33.88 on WikiText-2, and 4.7 on WikiText-103.
- Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
- One-sentence Summary: We propose context-aware temperature, a mechanism that enables temperature scaling for language models based on the context of each word.