Keywords: Multilingual Medical Question Answering, Medical Datasets
Abstract: This paper investigates, for the first time, Multilingual Medical Question Answering across high-resource (English, Spanish, French, Italian) and low-resource (Basque, Kazakh) languages. We evaluate three types of external evidence, such as local repositories, dynamically web-retrieved content, and LLM-generated explanations with models of varying size. Our results show that larger models consistently perform the task better in English for both the baseline evaluations and when adding external knowledge. Interestingly, retrieving the evidence in English often surpasses language-specific retrieval, even for non-English queries. These findings challenge the assumption that language-related external knowledge uniformly improves performance and reveal that effective strategies depend on both the source of language resources and on model scale. Furthermore, specialized static repositories such as PubMed are limited: while they provide authoritative expert knowledge, they lack adequate multilingual coverage and do not fully address the reasoning demands of the task.
Primary Area: datasets and benchmarks
Submission Number: 21527
Loading