A Multi-Modal Foundation Model Across Species for Interpreting Gene Functions

Published: 06 Oct 2025, Last Modified: 06 Oct 2025NeurIPS 2025 2nd Workshop FM4LS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: DNA Sequence Model, Large Language Model, Multi-Modal Machine Learning, Sequence-to-Function Model, Genome Annotation
TL;DR: A multi-modal foundation model for gene annotation and embedding generation
Abstract: Artificial Intelligence shows impressive performances in computational biology, especially in modelling DNA sequences and processing biomedical text annotations. However, most of the current computational methods analyze these two modalities separately, lack a discussion for their integration, for example, how to utilize gene annotations for functional inference of under-explored genes across different species. To interpret the contributions of modality representations in learning and predicting patterns in genomics and genetics, we develop DNACLIP and train it with paired DNA sequences and text descriptions from over 300,000 genes across 24 species, to model text and DNA sequences jointly and perform cross-species gene functional analysis. Through extensive benchmarking analysis, we show the unique contributions of aligned gene embeddings and text embeddings in various downstream applications, including gene clustering, gene annotation, disease risk prediction, function prediction, perturbation prediction, and expression prediction, etc. We also use DNACLIP to discover disease-specific gene programs from atlas data. Finally, we discuss the dominant areas of modality-specific embeddings and provide guidelines for users to select embeddings based on their requirements.
Submission Number: 1
Loading