Track: long paper (up to 8 pages)
Keywords: Flow matching, Gumbel-Softmax, protein design, DNA design
TL;DR: We introduce introduces a generative framework for discrete biological sequence design by leveraging a temperature-controlled Gumbel-Softmax interpolant to enable smooth transport from noise to structured sequences.
Abstract: We introduce Gumbel-Softmax Score and Flow Matching, a generative framework that relies on a novel Gumbel-Softmax interpolation between smooth categorical distributions to one concentrated at a single vertex by defining a time-dependent temperature parameter. Using this interpolant, we explore Gumbel-Softmax Flow Matching by deriving a parameterized velocity field transports smooth categorical distributions to the vertices of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our approach enables controllable generation with tunable temperatures and stochastic Gumbel noise during inference, enabling efficient de novo sequence design. Our experiments demonstrate state-of-the-art performance in conditional DNA promoter design and strong results in de novo sequence-only protein generation.
Submission Number: 134
Loading