Flow matching for generative modeling in bioinformatics and computational biology

Alex Morehead, Lazar Atanackovic, Akshata Hegde, Yanli Wang, Frimpong Boadu, Joel Selvaraj, Alexander Tong, Aditi Krishnapriyan, Jianlin Cheng

Published: 03 Dec 2025, Last Modified: 19 Dec 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Numerous problems in bioinformatics and computational biology can be framed as a task of learning a mapping from one state of a biological system to another relevant state or to explore novel data points across biologically constrained spaces. However, manually deriving such mappings, e.g., to transform cells in a diseased state back into a healthy state, or extrapolating from existing datasets to create new data, is often nontrivial and can require extraordinary domain expertise and resources. Fortunately, the field of generative artificial intelligence (AI) has introduced a new training paradigm referred to as (conditional) flow matching, which has emerged as a promising solution to this problem, with broad applicability in computer vision, natural language processing, and the physical and life sciences. Flow matching is a powerful and principled, data-driven framework for efficiently learning a mapping between arbitrary pairs of high-dimensional data distributions, making it well-suited for addressing problems in molecular and cell biology. In this Review, we characterize the theoretical foundations of flow matching and its applications in biomolecular modeling for proteins, DNA/RNA, small molecules, and their interactions, as well as its uses in single/multi-cellular modeling for cell phenotyping and imaging, each contributing towards the development of an AI-based virtual cell. Lastly, this review highlights open-source flow matching methods and discusses future directions in flow-based generative modeling for bioinformatics and computational biology.
Loading