Track: Full / long paper (5-8 pages)
Keywords: Discrete Flow Matching, Dirichlet Flow Matching, Fischer Flow Matching, Generative models, Dirichlet Diffusion Models
TL;DR: This is the first systematic evaluation of Discrete Flow Matching for regulatory DNA sequence design
Abstract: Flow matching and diffusion models have achieved strong performance in continuous data domains, but extending these methods to discrete biological sequences remains challenging. Recently, discrete generative frameworks have been proposed to address this limitation. Discrete Flow Matching (DFM) is a generative paradigm designed specifically for modeling discrete state spaces without continuous relaxation. While DFM has shown promising results in domains such as protein design and text generation, its applicability to regulatory DNA sequence design remains underexplored.
In this work, we investigate the use of Discrete Flow Matching for DNA sequence generation, focusing on promoter and enhancer design tasks. We benchmark our approach on three genomic datasets, including human promoters and enhancers from human melanoma and Drosophila brain tissues. We evaluate generation quality using both distributional metrics, such as Fréchet Biological Distance, and functional metrics based on predictive regulatory models. Our results show that DFM achieves competitive or superior performance compared to existing diffusion- and flow-based methods, particularly in unconditional enhancer generation and conditional promoter design. By operating directly on discrete nucleotide sequences, DFM avoids relaxation-induced artifacts and preserves biological sequence structure. These findings demonstrate that Discrete Flow Matching is a principled and effective framework for regulatory DNA sequence design. At the same time, our results reveal clear limitations of likelihood-based discrete flows in sampling rare, high-activity sequences, highlighting challenges in modeling extreme regions of complex activity landscapes To our knowledge, this work provides the first systematic evaluation of Discrete Flow Matching for regulatory DNA sequence design across both promoter and enhancer settings.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 24
Loading