GEMCONT: Genetics-based Multimodal Contrastive Learning for Disease-Focused Imaging Genetics
Keywords: Multimodal contrastive learning, Imaging genetics, Genome-wide association studies, Medical imaging, Machine learning–derived phenotypes
TL;DR: We propose GEMCONT, a genetics-based multimodal contrastive framework that aligns medical imaging with disease-specific variants to improve phenotype predictability and recovery of genetic associations.
Abstract: Genetic association studies have identified thousands of variants linked to complex traits, yet their functional impact remains poorly understood. High-content phenotypic data, such as medical imaging, offer a new avenue to bridge this gap by capturing the downstream effects of disease variants on tissue phenotypes. However, extracting structured, disease-relevant phenotypes for genetic analysis remains a challenge, as genetic signals are weak and sparse.
To address this, we introduce $\textbf{GEMCONT}$, a genetics-based multimodal contrastive learning framework that co-embeds genotype and phenotype data into a shared latent space, prioritizing disease-relevant variation. Unlike standard contrastive approaches, GEMCONT accounts for the sparse and subtle nature of genetic effects. We validate GEMCONT in controlled simulations as well as in UK Biobank spirometry and fundus data, demonstrating improved phenotype predictability and enhanced recovery of genetic associations compared to variational autoencoders, self-supervised contrastive models, and a PRS-based multimodal baseline.
Altogether, our results highlight the potential of multimodal contrastive learning to refine ML-derived phenotypes for large-scale genetic studies.
Primary Subject Area: Integration of Imaging and Clinical Data
Secondary Subject Area: Unsupervised Learning and Representation Learning
Registration Requirement: Yes
Visa & Travel: No
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 321
Loading