Bridging biomolecular modalities for knowledge transfer in bio-language models

Published: 11 Oct 2024, Last Modified: 02 Nov 2024Neurips 2024 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bio foundation models, Cross-modality knowledge transfer, Efficient model adaptation
TL;DR: mRNA FMs are limited and trained on small datasets. This study explores adapting DNA and protein foundation models for mRNA tasks, demonstrating effective and efficient cross-modality knowledge transfer and highlighting crucial factors impacting this
Abstract: In biology, messenger RNA (mRNA) plays a crucial role in gene expression and protein synthesis. Accurate predictive modeling of mRNA properties can greatly enhance our understanding and manipulation of biological processes, leading to advancements in medical and biotechnological applications. Utilizing bio-language foundation models allows for leveraging large-scale pretrained knowledge, which can significantly improve the efficiency and accuracy of these predictions. However, mRNA specific foundation models are notably limited posing challenges for efficient predictive modeling in mRNA-focused tasks. In contrast, DNA and protein modalities have numerous general-purpose foundation models trained on billions of sequences. This paper explores the potential for adaptation of existing DNA and protein bio-language models for mRNA-focused tasks. Through experiments using various mRNA datasets curated from both public domain and proprietary internal database, we demonstrate that pre-trained DNA and protein models can be effectively transferred for mRNA-focused tasks using various adaptation techniques such as probing, full-rank, and low-rank finetuning. In addition, we identify key factors that influence successful adaptation, offering guidelines on when general-purpose DNA and protein models are likely to perform well for mRNA-focused tasks. We further assess the impact of model size on adaptation efficacy, finding that medium-scale models often outperform larger ones for cross-modal knowledge transfer. We conclude that by leveraging the interconnectedness of DNA, mRNA, and proteins, as outlined by the central dogma of molecular biology, the knowledge in foundation models can be effectively transferred across modalities, significantly enhancing the repertoire of computational tools available for mRNA analysis.
Submission Number: 54
Loading