LLM-Empowered Medical Patient Communication: A Data-Centric Survey From a Clinical Perspective

LLM-Empowered Medical Patient Communication: A Data-Centric Survey From a Clinical Perspective

ACL ARR 2025 February Submission5701 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The integration of large language models (LLMs) into medical patient communication has shown promising potential for enhancing healthcare accessibility. Despite significant advancements in LLM capabilities, real-world clinical adoption remains challenging due to gaps between in-lab LLM training and the complexities of clinical practice. This survey provides a systematic and data-centric review of 21 text-based medical datasets that support LLM training and evaluation for patient communication. From a clinical perspective, we propose a novel taxonomy for classifying these datasets based on key clinical properties and upon which identify the training objectives they support. Additionally, we introduce a full lifecycle framework for optimizing the development of medical LLMs through alignment across dataset selection, fine-tuning methodologies, benchmark and evaluation metrics, highlighting the impact of alignment on model performance and training effectiveness. Finally, we provide guidance on enhancing medical datasets through clinically informed annotations and adaptive learning techniques to support the development of safe, clinically aligned LLMs for patient-centered communication in real-world healthcare settings.

Paper Type: Long

Research Area: Human-Centered NLP

Research Area Keywords: healthcare applications, clinical NLP, human-centered evaluation, conversational modeling, human-AI interaction, evaluation methodologies

Contribution Types: Surveys

Languages Studied: English

Submission Number: 5701

Loading