Evaluating Advanced Large Language Models for Pulmonary Disease Diagnosis Using Portable Spirometer Data: A Comparative Analysis of Gemini-1.5 Pro, GPT-4o, and Claude-3.5 Sonnet

Jin-Hyun Park; Chinock Cheong; Sanghee Kang; INPYO LEE; Sungjin Lee; Kisang Yoon; Hwamin Lee

Evaluating Advanced Large Language Models for Pulmonary Disease Diagnosis Using Portable Spirometer Data: A Comparative Analysis of Gemini-1.5 Pro, GPT-4o, and Claude-3.5 Sonnet

Jin-Hyun Park, Chinock Cheong, Sanghee Kang, INPYO LEE, Sungjin Lee, Kisang Yoon, Hwamin Lee

Published: 25 Sept 2024, Last Modified: 24 Oct 2024IEEE BHI'24EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chain of Thought, Claude 3.5 Sonnet, Clinical Rationale Generation, Large Language Models, Medical Guidelines Prompt

TL;DR: This study compares Gemini 1.5 Pro, GPT 4o, and Claude 3.5 Sonnet for interpreting pulmonary function test data, finding Claude 3.5 Sonnet superior in accuracy and clinical rationale.

Abstract: Pulmonary function tests (PFTs) are vital for diagnosing various pulmonary conditions, including chronic obstructive pulmonary disease (COPD) and asthma. Traditional PFTs, conducted using laboratory-based spirometers, are accurate but costly and require skilled technicians. Recent advancements in portable spirometry and large language models (LLMs) offer promising alternatives for remote diagnostics and clinical decision support. This study evaluates the performance of three advanced LLMs: Gemini 1.5 Pro, GPT 4o, and Claude 3.5 Sonnet in understanding and interpreting PFTs data. The models were assessed using three prompt types: zero shot, guidelines enhanced, and few shot, and their performance was measured in terms of accuracy, precision, recall, F1 score, and processing speed. Results indicate that Claude 3.5 Sonnet consistently outperformed the other models across all metrics, demonstrating superior comprehension and classification abilities. Error analysis revealed specific areas for improvement, particularly in logical reasoning and adherence to guidelines. The findings highlight the potential of LLMs to enhance diagnostic processes and reduce healthcare costs, while also emphasizing the need for further research to address data privacy, interoperability, and ethical considerations for clinical integration. Future efforts should focus on leveraging open-source models and expanding datasets to optimize LLMs for real-world medical applications.

Track: 2. Large Language Models for biomedical and clinical research

Supplementary Material: zip

Registration Id: 42N8ZW6TQYC

Submission Number: 326

Loading