SPRISS: approximating frequent k-mers by sampling reads, and applications

Diego Santoro, Leonardo Pellegrina, Matteo Comin, Fabio Vandin

2022 (modified: 28 Jan 2023)Bioinform. 2022Readers: Everyone

Abstract: The extraction of k-mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all k-mers and their frequencies is extremely demanding in terms of running time and memory, owing to the size of the data and to the exponential number of k-mers to be considered. However, in several applications, only frequent k-mers, which are k-mers appearing in a relatively high proportion of the data, are required by the analysis.

0 Replies