UPP2: fast and accurate alignment of datasets with fragmentary sequences

Published: 01 Jan 2023, Last Modified: 15 May 2025Bioinform. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multiple sequence alignment (MSA) is a basic step in many bioinformatics pipelines. However, achieving highly accurate alignments on large datasets, especially those with sequence length heterogeneity, is a challenging task. Ultra-large multiple sequence alignment using Phylogeny-aware Profiles (UPP) is a method for MSA estimation that builds an ensemble of Hidden Markov Models (eHMM) to represent an estimated alignment on the full-length sequences in the input, and then adds the remaining sequences into the alignment using selected HMMs in the ensemble. Although UPP provides good accuracy, it is computationally intensive on large datasets.
Loading