Variational Phylogenetic Inference with Products over Bipartitions

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We develop a variational inference approach for ultrametric phylogenetic trees that is differentiable, doesn't restrict tree space, and doesn't rely on MCMC subroutines.
Abstract: Bayesian phylogenetics is vital for understanding evolutionary dynamics, and requires accurate and efficient approximation of posterior distributions over trees. In this work, we develop a variational Bayesian approach for ultrametric phylogenetic trees. We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form density for the resulting distribution over trees. Unlike existing methods for ultrametric trees, our method performs inference over all of tree space, it does not require any Markov chain Monte Carlo subroutines, and our variational family is differentiable. Through experiments on benchmark genomic datasets and an application to the viral RNA of SARS-CoV-2, we demonstrate that our method achieves competitive accuracy while requiring significantly fewer gradient evaluations than existing state-of-the-art techniques.
Lay Summary: Understanding how species evolve over time often involves building evolutionary trees, or phylogenies, which show how different organisms are related. To do this accurately, scientists use statistical methods to estimate the most likely shapes of these trees based on genetic data. One popular but complex method is called Bayesian phylogenetics, which typically relies on slow and computationally intensive techniques. In recent years, researchers have developed faster alternatives using a method called variational inference, which approximates the range of possible tree shapes without relying on traditional, slower simulation methods. However, many of these existing approaches are still quite complex or limited in scope. In this study, we introduce a simpler and more efficient variational inference method for estimating evolutionary trees. Our technique models the timing of how species split from common ancestors and smoothly explores all possible tree shapes. It avoids the need for complicated sampling steps and can be easily optimized using modern tools. When tested on real-world genetic data—including data from the virus that causes COVID-19—the method achieved comparable accuracy to leading tools, while requiring much less computing power. This makes evolutionary analysis quicker and more accessible for researchers working with large genetic datasets.
Link To Code: https://github.com/EvanSidrow/VIPR
Primary Area: Probabilistic Methods->Variational Inference
Keywords: Phylogenetic Inference, Variational Bayes, COVID-19 Genetics, Linkage Clustering, Reinforce Estimators
Submission Number: 13788
Loading