Hyper-k-mers: Efficient Streaming k-mers Representation

Igor Martayan; Lucas Robidou; Yoshihiro Shibuya; Antoine Limasset

Hyper-k-mers: Efficient Streaming k-mers Representation

Igor Martayan, Lucas Robidou, Yoshihiro Shibuya, Antoine Limasset

Published: 01 Jan 2025, Last Modified: 10 Jul 2025RECOMB 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: K-mers are fundamental in bioinformatics, notably for error handling in sequencing data. Counting them is memory-intensive due to their redundancy. Existing methods reduce redundancy via super-k-mers, yet inefficiencies persist. We introduce hyper-k-mers, a more compact representation, reducing duplication bounds from 6 to 4 bits per k-mer. We provide a theoretical space efficiency analysis and introduce KFC, a k-mer counting algorithm leveraging hyper-k-mers. KFC significantly reduces memory usage, scaling sub-linearly with k-mer size and outperforming state-of-the-art tools, particularly for large k. Availability: KFC is available at https://github.com/lrobidou/KFC, with supplementary scripts at https://github.com/imartayan/KFC_experiments and preprint at https://doi.org/10.1101/2024.11.06.620789.

Loading