Sequence Design and Phylogenetic Inference with Generative Flow Networks

Published: 02 Mar 2026, Last Modified: 05 Mar 2026GEM 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: sequence design, phylogenetic inference, ancestral sequence reconstruction, GFlowNets
TL;DR: Joint Biomolecular Sequence Generation and Phylogenetic Inference with Generative Flow Networks
Abstract: Phylogenetic inference remains computationally challenging due to the exponentially growing tree topology search space, and current methods rely heavily on multiple sequence alignments (MSAs) which are expensive and error-prone. We propose AncestorGFN, a novel approach leveraging Generative Flow Networks (GFlowNets) for simultaneous sequence generation and phylogenetic inference without requiring MSAs. Our method learns to generate sequences matching a target distribution while the flow trajectories implicitly encode evolutionary relationships. We demonstrate that greedy traceback on maximum-flow trajectories recovers shared ancestral states, and evaluate on the let-7 microRNA family where the learned flow structure captures phylogenetic branching patterns. Furthermore, beam search at inference time discovers novel sequences clustering near known targets, suggesting applications in $\textit{de novo}$ sequence design. This work establishes a foundation for MSA-free phylogenetic inference using generative models.
Presenter: ~Qichen_Huang1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does not fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 24
Loading