Dr ChatGPT tell me what I want to hear: How different prompts impact health answer correctness

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Human-Centered NLP
Submission Track 2: Theme Track: Large Language Models and the Future of NLP
Keywords: ChatGPT, LLM, Health Misinformation, Prompt Knowledge, Consumer Health
TL;DR: Investigation of how different prompt impact the effectiveness of LLM (ChatGPT) for health information seeking, including risk of bias in retrieve-the-generate scenarios.
Abstract: This paper investigates the significant impact different prompts have on the behaviour of ChatGPT when used for health information seeking. As people more and more depend on generative large language models (LLMs) like ChatGPT, it is critical to understand model behaviour under different conditions, especially for domains where incorrect answers can have serious consequences such as health. Using the TREC Misinformation dataset, we empirically evaluate ChatGPT to show not just its effectiveness but reveal that knowledge passed in the prompt can bias the model to the detriment of answer correctness. We show this occurs both for retrieve-then-generate pipelines and based on how a user phrases their question as well as the question type. This work has important implications for the development of more robust and transparent question-answering systems based on generative large language models. Prompts, raw result files and manual analysis are made publicly available at \url{https://github.com/ielab/drchatgpt-health_prompting}.
Submission Number: 2653
Loading