BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models

BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models

ICML 2025 Workshop FM4LS Submission30 Authors

Published: 12 Jul 2025, Last Modified: 12 Jul 2025FM4LS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal omics integration, pre-trained language models, molecular property prediction, multi -model representation

Abstract: We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross-modal correspondence. BioLangFusion studies three standard fusion techniques: (i) codon-level embedding concatenation, (ii) entropy-regularized attention pooling inspired by multiple-instance learning, and (iii) cross-modal multi-head attention—each technique providing a different inductive bias for combining modality-specific signals. These methods require no additional pre-training or modification of the base models, allowing straightforward integration with existing sequence based foundation models. Across five molecular property prediction tasks, BioLangFusion outperforms strong unimodal baselines, showing that even simple fusion of pre-trained models can capture complementary multi-omic information with minimal overhead.

Submission Number: 30

Loading