Sample Complexity of Branch-length Estimation by Maximum Likelihood

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We consider the branch-length estimation problem on a bifurcating tree: a character evolves along the edges of a binary tree according to a two-state symmetric Markov process, and we seek to recover the edge transition probabilities from repeated observations at the leaves. This problem arises in phylogenetics, and is related to latent tree graphical model inference. In general, the log-likelihood function is non-concave and may admit many critical points. Nevertheless, simple coordinate maximization has been known to perform well in practice, defying the complexity of the likelihood landscape. In this work, we provide the first theoretical guarantee as to why this might be the case. We show that deep inside the Kesten-Stigum reconstruction regime, provided with polynomially many $m$ samples (assuming the tree is balanced), there exists a universal parameter regime (independent of the size of the tree) where the log-likelihood function is strongly concave and smooth with high probability. On this high-probability likelihood landscape event, we show that the standard coordinate maximization algorithm converges exponentially fast to the maximum likelihood estimator, which is within $O(1/\sqrt{m})$ from the true parameter, provided a sufficiently close initial point.
Lay Summary: In evolutionary biology, researchers typically use trees to describe how species evolve from common ancestors. Each branching in the tree represents a point where one species splits into two. Here we are interested in estimating how much genetic change happens along each branch of the tree. One challenge is that we can usually only observe the “leaves” — the current species — and not what happened inside the tree. Simple algorithms based on making one small improvement at a time often work surprisingly well in practice, although establishing this mathematically remains difficult because the landscape of possible choices is complex and full of local traps. Our research makes progress towards this goal. We show that if you collect enough data from the leaves, there’s a universal condition — no matter how big or complex the tree is — where the landscape becomes smooth and well-behaved. In this case, the simple algorithm converges quickly and reliably to the correct answer. This helps provide support for widely used tools in evolutionary analysis and latent tree models in machine learning.
Primary Area: Probabilistic Methods
Keywords: branch length estimation, statistical estimation on networks, coordinate maximization, sample complexity
Submission Number: 7403
Loading