Training Dynamics of Large Language Models Under Domain Adaptation in Clinical Domains

Training Dynamics of Large Language Models Under Domain Adaptation in Clinical Domains

ACL ARR 2026 January Submission9711 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: domain adaptation, continual learning, pre-training, automatic evaluation, evaluation methodologies, multilingualism, corpus creation, efficient models, language change, healthcare applications, clinical NLP

Abstract: While scaling laws for general-purpose language models are well-documented, the empirical trajectories governing their specialized adaptation to high-stakes clinical environments remain poorly understood. In this study, we systematically characterize the adaptation trajectories of the \textit{Qwen2.5} and \textit{Qwen3} model families within the German medical domain through continuous pre-training and model merging. We evaluate performance through a dual-metric lens, combining objective knowledge benchmarks via multiple-choice question answering with an assessment of generative proficiency through automated preference rankings. Our results reveal a critical asynchronous scaling behavior between factual recall and generative proficiency; while models achieve rapid stylistic alignment within the first 2B to 3B tokens, objective knowledge acquisition scales more gradually and exhibits significantly shallower improvement curves. Furthermore, our results demonstrate that once a sufficient architectural capacity is reached, domain-specific training allows models to bridge the performance gap to significantly larger generalist counterparts, challenging the assumption that raw parameter scale is the primary determinant of domain proficiency. These findings demonstrate that domain proficiency is not a monolithic acquisition process but a series of decoupled trajectories, providing an empirical blueprint for compute-optimal specialization in resource-constrained clinical environments.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: domain adaptation, continual learning, pre-training, automatic evaluation, evaluation methodologies, multilingualism, efficient models, language change, healthcare applications, clinical NLP

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: german

Submission Number: 9711

Loading