Position Paper On Diagnostic Uncertainty Estimation from Large Language Models:  Next-Word Probability Is Not Pre-test Probability

Yanjun Gao; Skatje Myers; Shan Chen; Dmitriy Dligach; Timothy A Miller; Danielle Bitterman; Guanhua Chen; Anoop Mayampurath; Matthew Churpek; Majid Afshar

Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability

Yanjun Gao, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy A Miller, Danielle Bitterman, Guanhua Chen, Anoop Mayampurath, Matthew Churpek, Majid Afshar

Published: 12 Oct 2024, Last Modified: 11 Nov 2024GenAI4Health PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, uncertainty estimation, electronic health record, pre-test probability estimation

TL;DR: We evaluate the limitations of large language models in estimating pre-test probabilities for clinical decision support.

Abstract: Large language models (LLMs) are being explored for diagnostic decision support, yet their ability to estimate pre-test probabilities, vital for clinical decision-making, remains limited. This study evaluates two LLMs, Mistral-7B and Llama3-70B, using structured electronic health record data on three diagnosis tasks. We examined three current methods of extracting LLM probability estimations and revealed their limitations. We aim to highlight the need for improved techniques in LLM confidence estimation.

Submission Number: 20

Loading