Rapid and Reproducible Multimodal Biological Foundation Model Development with AIDO.ModelGenerator

Caleb Ellington; Dian Li; Shuxian Zou; Elijah Cole; Ning Sun; Sohan Addagudi; Le Song; Eric P. Xing

Rapid and Reproducible Multimodal Biological Foundation Model Development with AIDO.ModelGenerator

Caleb Ellington, Dian Li, Shuxian Zou, Elijah Cole, Ning Sun, Sohan Addagudi, Le Song, Eric P. Xing

Published: 11 Jun 2025, Last Modified: 18 Jul 2025GenBio 2025 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Foundation Models, Multimodal Fusion, Prototyping, Reproducibility

TL;DR: AIDO.ModelGenerator is a reproducible, YAML-driven framework that enables rapid development, benchmarking, and deployment of multimodal biological foundation models across 300+ datasets and 30+ pretrained backbones.

Abstract: Foundation models (FMs) for DNA, RNA, proteins, cells, and tissues have begun to close long-standing performance gaps in biological prediction tasks, yet each modality is usually studied in isolation. Bridging them requires software that can ingest heterogeneous data, apply large pretrained backbones from various sources, and perform multimodal benchmarking studies at scale. We present [AIDO.ModelGenerator](https://genbio-ai.github.io/ModelGenerator/), an open-source toolkit that turns these needs into declarative experiment recipes through a structured experimental framework. AIDO.ModelGenerator provides (i) 300+ datasets covering DNA, RNA, protein, cell, spatial, and multimodal data types; (ii) 30+ pretrained FMs ranging from 3M to 16B parameters; (iii) 10+ plug-and-play use-cases covering inference, adaptation, prediction, generation, and zero-shot evaluation; and (iv) YAML-driven experiment recipes that enable exact reproducibility. On a sequence-to-expression prediction task, AIDO.ModelGenerator systematically builds and tests unimodal and multimodal models, achieving a new SOTA by combining DNA and RNA FMs that outperforms unimodal baselines by over $10\%$. In a Crohn’s disease case-study, the framework’s simulated knockout protocol ranks the clinically implicated target SOX4 6,000 positions higher than differential-expression baselines, illustrating its utility for therapeutic target discovery. We release code, tutorials, checkpoints, datasets, and API reference to accelerate multimodal FM research in the life sciences.

Submission Number: 103

Loading