Keywords: Phylogenetics, Generative Modeling, Flow Matching
TL;DR: Flow matching on Billera-Holmes-Vogtmann Tree Space to move random simple topologies to posterior-supported basins
Abstract: Phylogenetic trees are hybrid objects: branch lengths vary continuously, while topologies change discretely through edge contractions and expansions. Billera--Holmes--Vogtmann (BHV) tree space provides a canonical geometry for this structure, representing each resolved topology as a Euclidean orthant and topological changes as motion across shared lower-dimensional boundaries. This raises a natural question for generative modeling: can neural transport models learn meaningful motion through tree space? We introduce PhylaFlow, a hybrid flow-matching model for posterior-basin transport in BHV tree space. PhylaFlow is trained on BHV geodesic paths from random starting trees to short-run posterior samples, coupling continuous branch-length motion within orthants with learned boundary events and discrete topological transitions. We use Bayesian posterior-basin recovery as an operational test of this learned geometry: if the flow reaches meaningful regions of tree space, then finite-budget Bayesian refinement initialized from or guided by its terminal trees should recover posterior-supported topologies more efficiently. Across DS1-DS8 benchmarks, PhylaFlow substantially reduces initial Tree-KL relative to random, maximum-likelihood, maximum-parsimony, and UShER-based initializers. After finite-budget MrBayes refinement, direct PhylaFlow improves early and intermediate topology-recovery trajectories on most datasets, while a split-guided PhylaFlow-MCMC refinement obtains the strongest hard-case results. Compared with strong posterior-informed and neural controls, the best PhylaFlow variant outperforms short-warmup on seven of eight datasets and PhyloGFN on five of eight datasets under the same refinement budget. In a separate joint sequence-conditioned experiment, sequence embeddings steer posterior split recovery, although exact posterior topology recovery remains preliminary. These results show that hybrid flow matching can learn actionable transport in BHV tree space and provide a geometry-aware proposal mechanism for Bayesian phylogenetic inference.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 75
Loading