Hierarchical Multi-Omic CLIP for Missing-Modality Imputation & Transfer Learning in Blood Cancers

Published: 02 Mar 2026, Last Modified: 08 May 2026MLGenX 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Blood cancers are a major public health burden, affecting more than 10 million people worldwide. Genomic profiling has improved patient outcomes, but machine learning models still struggle to generalize because multi-omic cohorts are sparse, often lack entire modalities, and exhibit strong out-of-distribution (OOD) shifts across institutions and rare diagnoses. Here we introduce BLOOM-HiCLIP, the first hierarchical multi-omic CLIP framework for blood cancer, trained on the largest multi-omic blood cancer cohort to date of over 8,200 tumors spanning 165 diagnoses. BLOOM-HiCLIP leverages biological foundation models to learn taxonomy-consistent patient representations by mapping RNA and DNA-derived omics modalities to a shared latent space with high cosine similarity between imputed and true embeddings. The resulting joint embedding space supports strong cross-modal retrieval (94.8% Recall@5) and effectively captures blood cancer's hierarchical structure, outperforming a matched InfoNCE-trained ablation model. Notably, BLOOM-HiCLIP freezes all foundation encoders, training only lightweight projection heads and pooling parameters (39.5M total), achieving strong performance with 33.4x fewer parameters than costly end-to-end finetuning. When transferred to a hierarchical diagnostic model, these contrastive embeddings achieve strong discrimination across the blood cancer hierarchy (98.1% cell-of-origin micro-averaged AUROC) outperform baselines in fine-grained diagnosis across 29 subtypes on both in-distribution (34.8% top-1 accuracy) and OOD settings. These embeddings also effectively transfer to risk stratification, achieving an overall survival c-index of 0.788. BLOOM-HiCLIP demonstrates that hierarchical contrastive alignment can turn heterogeneous, incomplete multi-omic data into imputation-ready representations that reliably transfer to clinically relevant prediction tasks in blood cancer.
Submission Number: 32
Loading