MESH-HR: Multimodal Fusion of Somatic DNA Profiles and Histopathology for Continuous Breast Cancer Receptor Subtyping via LLM-Assisted Annotation

Published: 28 May 2026, Last Modified: 28 May 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Breast cancer, receptor subtyping, Multimodal learning, Somatic DNA / genomics, Whole-slide imaging (WSI), Histopathology, Attention-based multiple instance learning (ABMIL), XGBoost, LLM-assisted annotation (GPT-4), Cancers of unknown primary (CUP), Zero-shot generalization, TCGA-BRCA, Continuous / probabilistic predictions, Survival stratification, IHC-free prediction, Computational pathology
TL;DR: MESH-HR predicts breast cancer receptor status (ER, PR, HER2) from DNA profiles + pathology images — no IHC needed. Outperforms single-modality models and generalizes to cancers of unknown primary.
Abstract: Breast cancer receptor subtyping determines eligibility for targeted therapies based on estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) expression, which are typically assessed via immunohistochemistry (IHC). We introduce MESH-HR (Multimodal Ensemble of Somatic Variants and H\&E Slides for Hormone Receptor subtyping), a multimodal framework that integrates two biological modalities, somatic DNA profiles and whole-slide histopathology images, to predict receptor status probabilities without IHC. To enable large-cohort training without manual curation, ground truth receptor labels are extracted at scale from unstructured pathology reports using a GPT-4–based LLM pipeline. MESH-HR combines an attention-based multiple instance learning (ABMIL) vision encoder with an XGBoost model over somatic features, fusing complementary signal across DNA and imaging modalities. Trained on $>$1,300 breast cancers, MESH-HR achieves AUCs of 0.90 (ER), 0.84 (PR), and 0.96 (HER2) on held-out data, outperforming unimodal baselines, and generalizes zero-shot to The Cancer Genome Atlas Breast Cancer cohort (TCGA-BRCA). Continuous probabilistic outputs improve survival stratification over binary clinical labels, recovering prognostic signal lost under discrete IHC thresholds. We further apply MESH-HR zero-shot to cancers of unknown primary (CUP), a population in which receptor profiling is rarely performed due to the absence of a known tissue of origin, obtaining biologically consistent, survival-predictive receptor estimates that support receptor-informed personalized treatment stratification.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 15
Loading