Keywords: foundation model, representation learning, self-supervised learning, multimodal, precision medicine
TL;DR: A foundation model fusing genotype, phenotype, and image for precision medicine
Abstract: Precision medicine aims to personalize disease prevention, prediction, and diagnosis by leveraging genomic patient data. Although patient genomes provide valuable predictive insight, they cannot capture the full complexity of an individual's health. Integrating genomics with additional patient data modalities, such as clinical phenotypes and medical imaging, enables more accurate and comprehensive disease modeling. We introduce PM1, a multimodal foundation model trained on genomic data from 438,668 individuals linked to 3,421 clinical and lifestyle traits and 211,416 retinal fundus photographs drawn from the UK Biobank and EyePACS cohorts. PM1 couples modality-specific encoders with a transformer encoder trained with an information noise-contrastive estimation objective that fuses modalities into a joint latent space, plus generative modality decoders for cross-modal reconstruction and synthesis. A token-level masking schedule lets PM1 use participants with any subset of modalities (in UK Biobank only ${\approx}6\%$ have all three), substantially expanding effective training data. Joint modeling of retinal images, clinical traits, and genomic data surpasses single-modality and multimodal baselines. PM1 enables cross‑modal genotype inference, raises predictive performance for retinal diseases and systemic conditions, and supports conditioned single nucleotide polymorphism sequence and retinal image generation. As a group‑level validation, a GWAS on PM1’s image‑conditioned fusion embeddings recovers genome‑wide significant HERC2 pigmentation variants.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 23987
Loading