Med-PMC: A Personalized Multi-modal Framework for Dynamic Clinical Interaction and Assessment of Large Language Models

Med-PMC: A Personalized Multi-modal Framework for Dynamic Clinical Interaction and Assessment of Large Language Models

ACL ARR 2024 December Submission1540 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The application of the Multi-modal Large Language Models(MLLMs) in medical clinical scenarios remains underexplored. Previous benchmarks only focus on the capacity of the MLLMs in medical visual question-answering(VQA) or report generation and fail to assess the performance of the MLLMs on complex clinical multi-modal tasks. In this paper, we propose a novel Medical Personalized Multi-modal Consultation(Med-PMC) paradigm to evaluate the clinical capacity of the MLLMs.Med-PMC builds a simulated clinical environment where the MLLMs are required to interact with a patient simulator to complete the multi-modal information-gathering and decision-making task. Specifically, the patient simulator is decorated with personalized actors to simulate diverse patients in real scenarios. We conduct extensive experiments to access 12 types of MLLMs, providing a comprehensive view of the MLLMs’ clinical performance. We found that current MLLMs fail to gather multimodal information and show potential bias in the decision-making task when consulted with the personalized patient simulators. Further analysis demonstrates the effectiveness of Med-PMC, showing the potential to guide the development of robust and reliable clinical MLLMs. Code and data will be released upon acceptance.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: vision question answering ; multimodality

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 1540

Loading