everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Large Language Models (LLMs), such as GPT-4o, frequently produce hallucinations—factually incorrect or nonsensical outputs generally regarded as undesirable. This study, however, explores the notion of “good” hallucinations that may contribute to creativity and innovation. We propose metrics to assess hallucination quality, focusing on correctness, consistency, and reasoning diversity, which are evaluated using sample responses and semantic clustering. Our experiments explore different prompting techniques and hyperparameter configurations to provide comprehensive results based on these metrics. Furthermore, we investigate the distinction between process and outcome supervision, using multiple reasoning paths to enhance both creativity and accuracy. Preliminary results indicate that LLMs can generate creative hallucinations with minimal factual inaccuracies. This research provides a refined perspective on hallucinations in LLMs and suggests strategies to harness their creative potential, improving the reliability and flexibility of AI systems.