Choosing How to Adapt: An Empirical Study on Cross-Lingual Medical Question-Answering Adaptation

ACL ARR 2026 January Submission2808 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Medical Question Answering, Domain Adaptation, Supervised Fine-Tuning, Continual Pretraining, Multilingual Transfer
Abstract: The development of large language models (LLMs) has led to increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adaptation using French medical question answering (QA) as a case study. We compare continual pretraining (CPT), supervised fine-tuning (SFT), and their combination across three model families, multiple sizes, and three initialization types, explicitly disentangling adaptation effects from base model choice. We evaluate both multiple-choice (MCQA) and open-ended QA (OEQA) under greedy and constrained decoding using automatic metrics and LLM-as-a-Judge evaluation. For MCQA, CPT+SFT most often achieves the best scores, but gains over SFT are small and frequently not statistically significant, making SFT a strong and cost-effective default. For OEQA, CPT consistently improves overlap-based metrics, while SFT often degrades generation quality; instruction tuning and CPT+SFT are preferred by LLM-based evaluation. Cross-lingual experiments further show effective transfer from French adaptation to English benchmarks. Overall, we provide practical guidelines for selecting adaptation strategies under computational constraints.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: applications, fine-tuning, continual learning, prompting, robustness, transfer
Contribution Types: Model analysis & interpretability, Reproduction study, Publicly available software and/or pre-trained models
Languages Studied: French, English
Submission Number: 2808
Loading