Keywords: LLM, Hallucination, SFT Dataset, Evaluation Benchmark, Reliability
TL;DR: We teach LLMs to acknowledge uncertainty by fine-tuning them on questions about validated non-existent terms, reducing hallucination by 1-26% while maintaining general capabilities.
Abstract: Large language models (LLMs) often hallucinate producing fluent but false information—partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce $\textbf{HypoTermInstruct}$, an architecture-agnostic SFT dataset (31,487 responses for 11,151 questions) that teaches models to acknowledge uncertainty using systematically generated queries about validated non-existent (${hypothetical}$) terms. We also release $\textbf{HypoTermQA-Enhanced}$, a benchmark for hallucination tendency strengthened through multiple validations. In 400 controlled LoRA SFT runs (Llama3.1-8B-Instruct, Gemma3-4B-it; 100 fine-tuning configurations each with paired control) substituting generic instruction samples with HypoTermInstruct increases HypoTerm Score by +1.36% to +26.46% (median diffs) and FactScore by +0.52-0.61%, with modest MMLU decreases (-0.26--0.31%) and negligible shifts in instruction following and safety. Results show targeted uncertainty instruction during SFT reduces hallucination without architecture-specific engineering or preference/RL pipelines.
Submission Number: 78
Loading