Keywords: Large Language Models, Machine Learning for Health, Open Weights, Clinician Co-design
TL;DR: Llama-3-Meditron is an open-weight, continually-realigned, state-of-the-art, clinician co-designed medical large language model.
Abstract: We introduce Llama-3-Meditron, a high-performing open-weight suite of medical large language models (LLMs) built on LLama-3.1 (8B and 70B). The models are pre-trained on a carefully curated medical corpus that includes textbooks, filtered PubMed Central articles, and Clinical Practice Guidelines. To enable robust reasoning and generalization, we synthesize a new dataset for instruction fine-tuning, combining multi-turn Q&A, adversarial questions, medical exams, and differential diagnostics. Additionally, we propose MediTree, an inference pipeline that leverages the Tree-of-Thoughts sampling strategy, to boost the performance of our models. On widely-used benchmarks (MedMCQA, MedQA, PubMedQA), Llama-3-Meditron-8B surpasses all Llama-3.1 models by over 3%, and the 70B-parameter model outperforms other medical and non-medical LLMs across all tasks, outperforming Meditron 1 and 2, GPT-4 (fine-tuned), Flan-PaLM, and MedPaLM-2. These findings demonstrate that open-weight medical LLMs can set the state of the art in physician-level question-answering, advancing the accessibility and usefulness of AI in healthcare.
Submission Number: 61
Loading