Who Can be Your AI Doctor?: Evaluation for Disease diagnosis on Large Language Models

Jonghyeon Kim, Chan-Yang Ju, Dong-Ho Lee

Published: 2023, Last Modified: 06 May 2026ICTC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large Language Models (LLMs) have demonstrated outstanding performance in general domain knowledge inference such as arithmetic reasoning, commonsense reasoning, and open-domain question answering. Traditional LLMs were mainly developed and serviced by big tech companies as commercial, making it nearly impossible for researchers to access parameters. However, the recent emergence of non-commercial LLMs has led to ongoing research efforts to surpass commercial LLMs in specific areas using task-specific or domain-specific LLMs. In this study, among many tasks or domains, we focus on the high-level knowledge-demanding medical domain, particularly Automatic Diagnosis System (ADS). We evaluated whether current representative LLMs can perform disease diagnosis effectively and whether they have the potential to assist doctors in clinical situations. Moreover, we also observe the overall quality of responses to whether each LLM can accurately diagnose based on understanding the patient’s basic information such as age, sex, underlying diseases, and family history, and explore whether non-commercial LLMs are likely to outperform commercial LLMs in ADS.
Loading