Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: long paper (4–8 pages excluding references)
Keywords: contrastive learning, CLIP, transfer learning, representation learning, blood cancer, multi-omics
TL;DR: BLOOM-HiCLIP uses hierarchy-aware contrastive learning to align RNA and DNA embeddings from 8K+ blood cancers, enabling cross-modal imputation and robust diagnosis/prognosis transfer.
Abstract: Blood cancers are a major public health burden, affecting more than 10 million people worldwide. Genomic profiling has improved patient outcomes, but machine learning models still struggle to generalize because multi-omic cohorts are sparse, often lack entire modalities, and exhibit strong out-of-distribution (OOD) shifts across institutions and rare diagnoses. Here we introduce BLOOM-HiCLIP, the first hierarchical multi-omic CLIP framework for blood cancer, trained on the largest multi-omic blood cancer cohort to date of over 8,200 tumors spanning 165 diagnoses. BLOOM-HiCLIP leverages biological foundation models to learn taxonomy-consistent patient representations by mapping RNA and DNA-derived omics modalities to a shared latent space with high cosine similarity between imputed and true embeddings. The resulting joint embedding space supports strong cross-modal retrieval (94.8% Recall@5) and effectively captures blood cancer's hierarchical structure, outperforming a matched InfoNCE-trained ablation model. Notably, BLOOM-HiCLIP freezes all foundation encoders, training only lightweight projection heads and pooling parameters (39.5M total), achieving strong performance with 33.4x fewer parameters than costly end-to-end finetuning. When transferred to a hierarchical diagnostic model, these contrastive embeddings achieve strong discrimination across the blood cancer hierarchy (98.1% cell-of-origin micro-averaged AUROC) outperform baselines in fine-grained diagnosis across 29 subtypes on both in-distribution (34.8% top-1 accuracy) and OOD settings. These embeddings also effectively transfer to risk stratification, achieving an overall survival c-index of 0.788. BLOOM-HiCLIP demonstrates that hierarchical contrastive alignment can turn heterogeneous, incomplete multi-omic data into imputation-ready representations that reliably transfer to clinically relevant prediction tasks in blood cancer.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 40
Loading