Gumbel-Softmax Score and Flow Matching for Discrete Biological Sequence Generation

Published: 06 Mar 2025, Last Modified: 26 Apr 2025GEMEveryoneRevisionsBibTeXCC BY 4.0
Track: Machine learning: computational method and/or computational results
Nature Biotechnology: Yes
Keywords: Flow matching, Gumbel-Softmax, protein design, DNA design
TL;DR: We introduce introduces a generative framework for discrete biological sequence design by leveraging a temperature-controlled Gumbel-Softmax interpolant to enable smooth transport from noise to structured sequences.
Abstract: We introduce Gumbel-Softmax Score and Flow Matching, a generative framework that relies on a novel Gumbel-Softmax interpolation between smooth categorical distributions to one concentrated at a single vertex by defining a time-dependent temperature parameter. Using this interpolant, we explore Gumbel-Softmax Flow Matching by deriving a parameterized velocity field transports smooth categorical distributions to the vertices of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our approach enables controllable generation with tunable temperatures and stochastic Gumbel noise during inference, enabling efficient de novo sequence design. Our experiments demonstrate state-of-the-art performance in conditional DNA promoter design and strong results in de novo sequence-only protein generation.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Sophia_Tang1
Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 83
Loading