Regression Based Accuracy Estimation for Multiple Sequence Alignment

Luis Cedillo, Hector Richart Ruiz, Dan DeBlasio

Published: 24 May 2022, Last Modified: 09 Feb 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: h3>Abstract</h3> <p>Multiple sequence alignment plays an important role in many important analyses. However, aligning multiple biological sequences is a complex task, thus many tools have been developed to align sequences under a biologically-inspired objective function. But these tools require a user-defined parameter vector, which if chosen incorrectly, can greatly impact down-stream analysis. Parameter Advising addresses this challenge of selecting input-specific parameter vectors by comparing alignments produced by a carefully constructed set of parameter configurations. In an ideal scenario, we would rank alignments based on their accuracy. However, in practice, we do not have a reference from which to calculate accuracy. Therefore, it becomes necessary to <i>estimate</i> the accuracy to rank the alignments. One solution involves the use of estimators such as Facet. The accuracy estimator Facet computes an estimate of accuracy as a linear combination of efficiently-computable feature functions. In this work we introduce two new estimators called Lead (short for Learned accuracy estimator from large datasets) which use the same underlying feature functions as Facet but are built on top of highly efficient machine learning protocols, allowing us to take advantage of a larger training corpus.</p><h3>Note about previous versions</h3> <p>A previous version of this paper was released on bioRxiv and presented the results of our previous study (Facet) with an error. This error has been corrected, and the conclusions made have been updated based on this new data. This corrected version stands as reference for anyone who may have encountered the versions with inaccuracies.</p>
Loading