Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics,
which is not typically found in general-purpose large multimodal models. Building upon Gemini’s
multimodal models, we develop several models within the new Med-Gemini family that inherit core
capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology,
histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for
AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results
across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports
on normal cases, and 43% and 65% on abnormal cases, are evaluated as “equivalent or better” than
the original radiologists’ reports. We demonstrate the first ever large multimodal model-based report
generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports
considered clinically acceptable, although additional research is needed to meet expert radiologist
reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance
in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA,
exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology
image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches
task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard
linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically
correlated diseases for which it has never been trained. Although further development and evaluation
are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini
across a wide range of medical tasks.
Loading