Best of Both Worlds: Combining General and Clinical Language Models for Classification and Text Generation

Sasha Ronaghi; Asad Aali; Chloe O'Connell Stanwyck; Miguel Angel Fuentes Hernandez; Tina Hernandez-Boussard; Emily Alsentzer

Best of Both Worlds: Combining General and Clinical Language Models for Classification and Text Generation

, , , , ,

Published: 27 Nov 2025, Last Modified: 09 Dec 2025ML4H 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient Domain Adaptation, Clinical Adaptation, Language Models

TL;DR: Efficient, training-free clinical domain adaptation of frontier large language models.

Track: Findings

Abstract: We study proxy tuning, a training-free, decoding-time method that combines the strengths of general and clinical language models. Across three classification and four text generation tasks, zero-shot proxy tuning consistently improves performance over baselines, yielding an average 6.5\% Macro-F1 gain over a large general model on classification tasks and surpassing a 70B clinical model on all generative tasks. Our analysis reveals that proxy tuning isolating clinical continued pretraining produces the largest gains on medical knowledge-intensive tasks. We additionally introduce Cross-Architecture Proxy Tuning (CAPT), which enables proxy tuning across models with different architectures and limited logit distribution access. CAPT with a new-generation base model (Qwen3-30B) achieves performance comparable to supervised fine-tuning with 2,600 samples on classification tasks and produces 90\% clinically safe outputs on generation tasks. Our findings demonstrate that proxy tuning offers a practical, efficient path to clinical domain adaptation without model retraining.

General Area: Applications and Practice

Specific Subject Areas: Natural Language Processing

Supplementary Material: zip

Data And Code Availability: Yes

Ethics Board Approval: No

Entered Conflicts: I confirm the above

Anonymity: I confirm the above

Submission Number: 241

Loading