Accelerating minimap2 for long-read sequencing applications on modern CPUs

Published: 01 Jan 2022, Last Modified: 13 May 2025Nat. Comput. Sci. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Long-read sequencing is now routinely used at scale for genomics and transcriptomics applications. Mapping long reads or a draft genome assembly to a reference sequence is often one of the most time-consuming steps in these applications. Here we present techniques to accelerate minimap2, a widely used software for this task. We present multiple optimizations using single-instruction multiple-data parallelization, efficient cache utilization and a learned index data structure to accelerate the three main computational modules of minimap2: seeding, chaining and pairwise sequence alignment. These optimizations result in an up to 1.8-fold reduction of end-to-end mapping time of minimap2 while maintaining identical output. mm2-fast is an accelerated version of minimap2, a popular software for long-read data analysis. mm2-fast introduces high-performance parallel computing techniques to reduce the overall runtime of minimap2.
Loading