Consider Size not Language: Effects of External Evidence in Multilingual Medical Question Answering

Consider Size not Language: Effects of External Evidence in Multilingual Medical Question Answering

ICLR 2026 Conference Submission21527 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multilingual Medical Question Answering, Medical Datasets

Abstract: This paper investigates, for the first time, Multilingual Medical Question Answering across high-resource (English, Spanish, French, Italian) and low-resource (Basque, Kazakh) languages. We evaluate three types of external evidence, such as local repositories, dynamically web-retrieved content, and LLM-generated explanations with models of varying size. Our results show that larger models consistently perform the task better in English for both the baseline evaluations and when adding external knowledge. Interestingly, retrieving the evidence in English often surpasses language-specific retrieval, even for non-English queries. These findings challenge the assumption that language-related external knowledge uniformly improves performance and reveal that effective strategies depend on both the source of language resources and on model scale. Furthermore, specialized static repositories such as PubMed are limited: while they provide authoritative expert knowledge, they lack adequate multilingual coverage and do not fully address the reasoning demands of the task.

Primary Area: datasets and benchmarks

Submission Number: 21527

Loading