TL;DR: This paper introduces a trimodal contrastive learning framework that enables fine-grained histopathology annotation by learning from gene expression data
Abstract: The characterization of histopathology with AI promises to assist clinical decision-making, but it is currently limited due to coarse-grained annotations that miss cellular identities. To overcome this gap, we bridge histopathological images, gene expression profiles, and natural-language descriptions using *SpatialWhisperer*, a trimodal contrastive learning model. Our training integrates community-scale datasets comprising spatially resolved gene expression profiles paired with histopathology images, as well as single-cell gene expression profiles with detailed annotations. The shared gene expression modality implies a transitive relationship between images and textual annotations, which our method leverages to enable accurate zero-shot cell type annotation directly from H&E images. *SpatialWhisperer* outperforms published baselines, achieving relative AUROC gains of up to 15.9% across three benchmarks spanning 19 tissues and 20 cell types. When training with data from all three modality pairs, we observe performance gains in low-data regimes. We formalize our approach and present a sufficient condition under which this transitive alignment is induced. Our work establishes *transitive representation learning* for fine-grained interpretation of histopathology images.
Lay Summary: When pathologists diagnose cancer, they examine stained tissue under a microscope to identify what kinds of cells are present — a slow process where AI tools could help. But current pathology AIs recognize patterns only across larger tissue regions, because training at the single-cell level requires costly manual cell annotation.
We address this limitation by leveraging another data domain in which cells have been labeled extensively: Gene expression. Our key idea is to transfer these annotations to the pathology images by combining labeled gene expression data with histopathology for which paired gene expression data exists. The latter paired data type becomes increasingly available through a technology called spatial transcriptomics.
Our model, *SpatialWhisperer*, learns this three-way correspondence from one million image-gene pairs and one million gene-label pairs, without ever seeing image-label pairs directly. Across three test datasets covering 19 tissues and 20 cell types, it outperforms specialized pathology AIs by up to 16 percent. We expect this idea to be an integral component for future models allowing clinicians to identify individual cell types directly from routine pathology images. The same idea applies anywhere two kinds of data can be linked through a shared third.
Link To Code: https://github.com/zinagoodlab/spatialwhisperer
Primary Area: Applications->Health / Medicine
Keywords: Multimodal Learning, Contrastive Learning, Computational Biology, Computational Histopathology, Representation Learning, Zero-Shot Learning, Spatial Transcriptomics, AI for Science, Medical Imaging, Cross-Modal Transfer
Originally Submitted PDF: pdf
Submission Number: 30167
Loading