BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning

Ching-Huei Tsou; Michal Ozery-Flato; Ella Barkan; Diwakar Mahajan; Ben Shapira

BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning

Ching-Huei Tsou, Michal Ozery-Flato, Ella Barkan, Diwakar Mahajan, Ben Shapira

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation alignment, Multi-modal reasoning, Semantic grounding, biomedical AI, Foundation models, Large language models (LLMs), Multi-modal representation learning

TL;DR: BioVERSE aligns pretrained biomedical foundation models with LLM semantic space, bridging raw multi-modal biomedical data and language for flexible, principled reasoning.

Abstract: Recent advances in large language models (LLMs) and biomedical foundation models (BioFMs) have achieved strong results in biological text reasoning, molecular modeling, and single-cell analysis, yet they remain siloed in disjoint embedding spaces, limiting cross-modal reasoning. We present BioVERSE (Biomedical Vector Embedding Realignment for Semantic Engagement), a two-stage approach that adapts pretrained bioFMs as modality encoders and aligns them with LLMs through lightweight, modality-specific projection layers. The approach first aligns each modality to a shared LLM space through independently trained projections, allowing them to interoperate naturally, and then applies standard instruction tuning with multi-modal data to bring them together for downstream reasoning. By unifying raw biomedical data with knowledge embedded in LLMs, the approach enables zero-shot annotation, cross-modal question answering, and interactive, explainable dialogue. Across tasks spanning cell-type annotation, molecular description, and protein function reasoning, compact BioVERSE configurations with smaller LLMs surpass larger LLM baselines while enabling richer, generative outputs than existing BioFMs, establishing a foundation for principled multi-modal biomedical reasoning.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6267

Loading