Evaluating Large Language Models as AI Agents for Cross-Border Healthcare Delivery in the European Union

14 Sept 2025 (modified: 06 Dec 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical LLM, EHDS, Cross-borderhealth;
TL;DR: General AI models (Claude, ChatGPT) outperformed specialized medical AI by 84-100% on EU travel health queries, proving real-world healthcare needs broad knowledge of local systems, not just medical expertise.
Abstract: This study evaluates six Large Language Models (LLMs) as autonomous agents for providing cross-border healthcare information in EU travel scenarios. We tested three general-purpose models (Claude 3.5, Gemini 2.0, ChatGPT-4o) and three specialised medical models (Internist AI, OpenBioLLM, Biomistral) across five increasingly complex prompts simulating travellers' diarrhoea scenarios in Paris, Tallinn, and Rome. Our evaluation framework assessed models' abilities to provide location-specific medical guidance, understand EU healthcare regulations, and envision integration with the European Health Data Space (EHDS). Results show that general-purpose models significantly outperformed specialised medical models (average scores: Claude 4.6/5, ChatGPT 4.8/5 vs. medical models 1.9-2.5/5), demonstrating superior contextual understanding and localisation capabilities. This counterintuitive finding suggests that broad training on diverse data may be more valuable than medical specialisation for healthcare agent applications requiring real-world context and regulatory knowledge.
Submission Number: 167
Loading