Cooking Hallucinations: Dynamic Train-Time Softmax Tempering for PolyCompQA (Polymer Composite QA)

Cooking Hallucinations: Dynamic Train-Time Softmax Tempering for PolyCompQA (Polymer Composite QA)

ACL ARR 2025 February Submission8375 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) can produce overconfident and factually unsupported answers, limiting their reliability for tasks that demand faithfulness to provided evidence. Softmax tempering, which is multiplying pre-softmax logits by a temperature $T$ at training time, was originally used for knowledge distillation, then for offering a simple approach to improve both confidence calibration and factual consistency. In this paper, we provide (1)~a structured literature review of softmax tempering in Transformer-based models; (2)~an empirical study using \model, comparing tempered fine-tuning against standard fine-tuning on SQuAD v2 and a new dataset, PolyCompQA, which contains QA pairs based on polymer composite literature tables. Our experiments reveal that moderate temperatures (e.g., $T=1.67$) reduce hallucinations and improve calibration metrics, with minimal implementation overhead.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: softmax tempering, tableqa

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Theory

Languages Studied: English

Submission Number: 8375

Loading