RabbitTrim: Highly Optimized Trimming of Illumina Sequencing Data on Multi-core Platforms

Published: 01 Jan 2024, Last Modified: 06 Aug 2024ISBRA (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Trimmomatic is a de-facto standard trimmer for Illumina sequencing data. However, limited by its sub-optimal implementation, it cannot fully exploit the computational power of common multi-core platforms. Therefore, we propose RabbitTrim, a highly optimized implementation of Trimmomatic based on efficient I/O strategies, parallel (de)compression engines, block-based memory pools, bitwise operations and vectorization techniques. RabbitTrim achieves speedups between 1.5x and 3.3x (3.7x and 8.0x) when processing plain (gzip-compressed) FASTQ files on a 48-core Intel server. Overall, RabbitTrim is able to process 101 GB gzip-compressed sequencing data in only 5 min while Trimmomatic requires at least 21 min. The source code is available at https://github.com/RabbitBio/RabbitTrim.
Loading