Abstract: Evolutionary histories are uncertain and heterogeneous across the genome. As a result, for any given set of species, we may obtain a large number of incongruent trees. The differences in these trees are not limited to topology as branch lengths are also uncertain and biologically heterogeneous. Yet, despite the biological significance of branch length differences, comparing and reconciling trees has predominantly focused on topology. To close this gap, we explore the problem of matching a query tree to a reference tree by assigning new branch lengths to the query tree. We formulate this objective as a least-squares optimization problem defined on the set of all pairwise distances. We prove that the problem is convex and thus can be solved optimally using standard tools. We also introduce dynamic programming algorithms to compute the required inputs to the optimization problem in quadratic time. We use this framework to estimate the branch lengths of a fixed species tree topology in the unit of the expected number of substitutions per site by matching it to gene trees that have branch lengths in the same unit.
Loading