Generalized Representation Learning for Multimodal Histology Imaging Data Through Vision-Language Modeling

Published: 06 Mar 2025, Last Modified: 18 Apr 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny Paper Track
Keywords: multiplexed spatial proteomics, digital pathology, contrastive learning, vision language modeling
Abstract: We introduce a trimodal vision-language framework that unifies multiplexed spatial proteomics (SP), H&E histology, and textual metadata in a single embedding space. A specialized transformer-based SP encoder, alongside pretrained H&E and language models, captures diverse morphological, molecular, and semantic signals. Preliminary results demonstrate improved retrieval, zero-shot classification, and patient-level phenotype predictions, indicating the promise of this multimodal approach for deeper insights and translational applications in digital pathology.
Attendance: Jacob Leiby
Submission Number: 49
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview